For what they were... we are: Finally some improved knowledge of haplogroup R1a1 (Y-DNA)

October 31, 2012

Finally some improved knowledge of haplogroup R1a1 (Y-DNA)

Haplogroup R1a, most of which is R1a1, dominant in Northern South Asia and Eastern Europe, as well as in much of Central Asia, has been giving headaches to population geneticists, academic and amateur alike, because key markers were not identified, making most of the haplogroup look like an amorphous goo, the same in India as in Europe. It seems that this may change now:

Horolma Pamjav et al., Brief communication: New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1. AJPA 2012. Pay per view ··> LINK [10.1002/ajpa.22167]

Abstract

Haplogroup R1a1-M198 is a major clade of Y chromosomal haplogroups which is distributed all across Eurasia. To this date, many efforts have been made to identify large SNP-based subgroups and migration patterns of this haplogroup. The origin and spread of R1a1 chromosomes in Eurasia has, however, remained unknown due to the lack of downstream SNPs within the R1a1 haplogroup. Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.

The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.

Not having access to the paper right now, I can't say much more but I believe that the abstract alone is very informative already.

Distribution of R1a per Underhill 2010

Update:

Fig. 1 - MJ trees
(click to expand)

A reader already sent me a copy of the paper and I think that it has two aspects:

On one side the paper effectively detects these markers and study them, as well as R-M458 in Hungarians and related ethnic groups (Csangos, Szeklers, Hungarian Roma), as well as in Malaysian Indians, Uzbeks and Mongols. This part is informative, even if the selected Asian populations may not be the best choice (Mongols are low in R1a and so are Tamils who make up the bulk of Malaysian Indians).

On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe.

Even using Underhill age estimates, they'd imply at least LGM dates for the arrival to Europe after the due correction. Their own dates, after due x2 correction, give Late Upper Paleolithic dates for the haplogroups researched here.

Also the authors insist on arguing against a South Asian origin of R1a1 (Underhill 2010) on what sound like weak and fallacious arguments:

Previous publications have pointed out that regions of highest haplogroup frequencies do not always indicate the territory of origin (Cinnioglu et al., 2004) and high STR diversity may not be exclusively an indicator of in-situ diversification but could also be the consequence of repeated gene flow from different sources (Zerjal et al., 2002; Sharma et al., 2009).

Basically they are nagging: "Underhill could hypothetically be wrong in his conclusions but we have no evidence whatsoever that he is - just saying".

The real reason is that they seem to hope to find a more westerly origin for the lineage and attribute it again to Indoeuropean expansions, in line with classic speculations for which the high South Asian STR diversity levels are a big problem. However it is most unlikely that a bunch of horse-riding nomads could so radically alter the genetic landscape of the whole subcontinent, more so when its agriculture was already fully developed, sustaining no doubt high densities.

But notwithstanding all those highly questionable opinions, the discovery of new haplogroups adding to our comprehension of this major lineage is a great advance.

Update:

It seems that some of the data exposed in this paper was already floating around in some circles because ISOGG already includes the "new" haplogroups in its phylogenetic synthesis. Most interestingly the two "European" clades (along with a third one, whose geography I ignore so far) make up a larger haplogroup (R1a1a1b1a - S198/Z282), which is "brother" of the "Asian" one (R1a1a1b2 - S202/Z93).

As I was just commenting elsewhere the key to the origins of R1a is not so much in these low level haplogroups but in the higher "asterisk" paragroup, which (from memory) used to be concentrated in Pakistan and nearby areas of India, etc.

But once reached the level of R1a1a1b1 (S339/Z283), this lineage seems to have split in two: one which we can describe as "European" and another which we can describe as "Indian".

The European half is treated in this paper as two of its subclades only and separately, what may be confuse. Hence I am adding here a synthesis of the current ISOGG phylogeny of R1a, with some annotations, for easier reference:

R1a* ··> Iran, Persian Gulf, Turkey
R1a1 (L120/M516, L122/M448, M459, Page65.2/SRY1532.2/SRY10831.2)

R1a1* ··> Iran, Caucasus, Greece, Scandinavia
R1a1a (L168, L449, M17, M198, M512, M514, M515)

R1a1a* ··> where? (not clear)
R1a1a1 (M417, Page7)

R1a1a1* ··> where?
R1a1a1a (L664/S298) ··> where?
R1a1a1b (S224/Z645, S441/Z647)

R1a1a1b* ··> where?
R1a1a1b1 (S339/Z283)

R1a1a1b1* ··> where?
R1a1a1b1a (S198/Z282)

R1a1a1b1a*
R1a1a1b1a1 (M458) ··> Central & East Europe
R1a1a1b1a2 (S204/Z91, S466/Z280) ··> Europe, Central Asia
R1a1a1b1a3 (S221/Z284, S443/Z289) ··> where?

R1a1a1b2 (S202/Z93) ··> India, Central Asia

All the data on the geography of top level "asterisk" paragroups is from Underhill 2010, already mentioned above. It suggest a West Asian origin for R1a overall and spread to West and East since the R1a1a level or lower.

I used colors to emphasize the clades discussed here (purple for the larger haplogroup, blue for the European-leaning clade and red for the Indian-leaning one).

Clades in cursive are "proposed", not yet consolidated.

86 comments:

DavidskiOctober 31, 2012 at 11:53 AM
Maju you're full of it, as usual.

Europe has much higher R1a SNP diversity than South Asia, where only Z93 is found.

Z280, M458 and ancestral lineages of Z93 all overlap in Europe. Not even in Inner or Central Asia as this paper claims.

Therefore, R1a expanded deep into Asia from somewhere in Europe.

Here's the latest tree you clown.

http://img543.imageshack.us/img543/4425/r1am198.jpg
ReplyDelete
Replies
UnknownOctober 31, 2012 at 10:51 PM
It must be from the Punjabi diaspora in Malasia
ReplyDelete
Replies
clusteredmapsNovember 1, 2012 at 12:49 AM
could you please send me a link to the site.
ReplyDelete
Replies
clusteredmapsNovember 1, 2012 at 12:50 AM
my email is clusteredmaps@aol.com
and I could put the site on the blog clusteredmaps.blogspot.com
ReplyDelete
Replies
n/aNovember 1, 2012 at 1:11 PM
"R1a1a1a (L664/S298) ··> where?"

Northwestern Europe.

"On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe."

You seem pretty confused. Pedigree estimates of autosomal SNP mutation rates have nothing to do Y STR mutation rates, which have their own pedigree estimates (leading to dates much younger than with Zhivotovsky).
ReplyDelete
Replies
n/aNovember 1, 2012 at 1:33 PM
"R1a1a1a (L664/S298) ··> where?"

Northwestern Europe.

"On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe."

Pedigree estimates of autosomal SNP mutation rates have nothing to do Y STR mutation rates, which have their own pedigree estimates (leading to dates much younger than with Zhivotovsky).
ReplyDelete
Replies
AshApril 29, 2013 at 6:12 AM
I am an Indian with r1a1 as determined by the National Geographic Genographic project, so the comment above by davidski is weird.
ReplyDelete
Replies
drMiraculixJuly 17, 2013 at 6:05 AM
Is it not logical that R1a originated in Northwestern Europe when the core area for the R haplogroup is obviously there?
ReplyDelete
Replies
MajuJuly 18, 2013 at 3:02 AM
"... does higher basal diversity necessarily mean the haplogroup R originated in South Asia?"

Basal diversity is the best indicator we have to track the prehistory hidden in the haploid phylogenies. It's not 100% certain, I guess we can always imagine extreme catastrophes that brutally reduced the diversity in some places or maybe the differences between this and that other region are not clear and conclusive enough but otherwise...

When migration happens it is always a subset which migrates, not ALL lineages from the homeland. This has been confirmed from recent data many times. So the colony tends to have less basal diversity than the motherland. There may be confounding factors if the motherlands are several (increasing the diversity because of the various sources) but common sense and careful parsing of the data should also reveal that.

"But I agree that if we assume that higher basal diversity automatically indicates place of origin, then your analysis is entirely reasonable".

Thanks.

"The only way to be sure of how these populations have originated, grown and emigrated is after all to have DNA from dated human remains, but I guess that could take some time..."

Ideally it should be that way but: (1) human remains do not always exist, (2) DNA does not always survive (this applies especially to hot humid areas) and (3) the desirable work of extraction and sequencing of the DNA does not always happen (or is done with too cheap means that don't reveal the whole data). In the case of Y-DNA it is much harder to survive than mtDNA, so we normally have much better info for the matrilineages.

"I don't mean to be difficult, but I am having trouble accepting some of the underlying assumptions. Is it for instance impossible that the P haplogroup spread to NW Europe before mutating into R?"

Nothing is "impossible" as in "100% impossible" but there are other principles in science known as parsimony or Occam's razor: i.e. what makes good sense and what seems extremely unlikely.

So the only thing (and is not little thing) that we can say about your question is that it is extremely unlikely.

Also you should consider archaeological data, which is not really supportive of your hypothesis, which implies ancient mass migrations from Europe into West and South Asia nowhere to be seen.
ReplyDelete
Replies
drMiraculixJuly 24, 2013 at 2:06 PM
Hi

I noticed that in the Family Tree DNA database 'R1a and all subclades' there are now two test results from England that have been confirmed by SNP testing as R1.
They are on page 1 under the header 'ungrouped'.
The names are Pickering and Cates.
Still the only place the simple R has been found is in Pakistan and the area around there. Do you expect that the simple R will also be found in Europe, or not?
ReplyDelete
Replies
drMiraculixJuly 24, 2013 at 8:39 PM
I thought that the R-M207* was the earliest version found of the R haplogroup. According to Wikipedia 'Y-chromosomes which possess the marker M207 that defines Haplogroup R-M207 but neither of the markers for its subgroups, are categorized as belonging to group R-M207*.' But according to what you are saying there is no such 'simple' version of the R haplogroup that doesn't have markers for any subgroup. If that is true, then it seems very difficult to sort the different haplotypes of the haplogroup in terms of chronology then?

It is clear that the R-M207* has only been found in Pakistan and the area surrounding Pakistan, and so these results do indicate an initial appearance for the R haplogroup there. And unless the results should change significantly with time, this means that if we assume that the oldest versions of the haplogroup are still living in the same area as they were living when they first came into existance, then the haplogroup originated in Pakistan/India.
Does that not make sense?
Then we can of course endlessly debate whether the above mentioned assumption is reasonable or not. But if the R-M207* are not an older version of R, then it doesn't make much sense to say that the R are from there?
I also thought that the R1 haplotype is a simmple version of R1 without any of the markers for any subgroups. Otherwise it doesn't make much sense to call it R1, then they would have to find the subgroup. After all the same problem of chronology would apply to this as does to the simple R? We could not say it is older unless it is a simpler version?
ReplyDelete
Replies
drMiraculixJuly 24, 2013 at 9:25 PM
Yes I know that R* represents a collection of subgroups.

So the assumption is basically that because R2 are found in India/Pakistan then that is the place R is from, and then the basal diversity seems to back that up?
And other than that it is hard to say anything about the chronology of the haplotypes in Y-DNA?

Well, it will be interesting to see how interpretations may change when researchers start working with full Y-DNA sequenses as you mentioned above. I have also heard rumours about a new technique that is capable of extracting human DNA from archaeological sites that contain human excrement. Apparently they can still find DNA in really old samples. If that is true, they could yield a lot of historical DNA information. I have only been told of this new technique so I don't know yet if it's really as revolutionary as it sounds.
But new technologies can make for much improved knowledge in this field in the future, that's for sure.
ReplyDelete
Replies
Jovial CompanyJuly 27, 2013 at 12:29 PM
R1a1a is widespread throughout Indian Sub continent
ReplyDelete
Replies
Jovial CompanyJuly 27, 2013 at 12:34 PM
Please analyse www.harappadna.org (or single p)data which has more representative sample analysis for this region of Indian subcontinent
ReplyDelete
Replies
drMiraculixJuly 28, 2013 at 2:45 AM
The link provided by Davidski in the first comment at the top is a very good link.
http://img543.imageshack.us/img543/4425/r1am198.jpg
The image shows the branches of the tree for R1a1a.

I also found another website with some interesting information about the R* in India, claiming that they all probably are R2 or R2a.
http://r2dnainfo.blogspot.no/2010/08/all-r2s-may-now-be-known-as-r2a.html
The site also has a new Haplogroup R tree that covers the entire R haplogroup.
Although the results indicate the R* in Pakistan/India is mostly R2 the results are not completely final, as more testing is needed.
But I think the tree that is presented is very useful, and relevant to the above discussion, don't you agree?
Doesn't this mean that the argument that haplogroup R is from Europe just got stronger?
My personal opinion is that the tree presented at the website makes it well possible that Europe is the place. But I also see that India is still a possibility, and no absolute conclusion has been made by me yet. Given all the haplogroups that have appeared in India it would not surprise me if India was the origin, however the division of Europe into West(R1B) and East(R1a) still makes me doubt the Indian origin.
As for your above comments on me about Africa, I would say it is rather rude to insinuate that I am a racist just because I think the human species originated outside Africa. In any case the oldest haplogroups A and B are found in Africa today, so the fact that those people are in a sense 'who our ancestors were' is of course not changed by the fact that the earliest humans may have appeared outside Africa. A and B are still the oldest haplogroups. So screaming racism(!) is just not called for.
ReplyDelete
Replies
laymanNovember 22, 2016 at 12:07 AM
Well, if you look at frequency and distribution map of R1a1a1 it becomes possible to infer migratory patterns which do appear to north south and west to east. This corresponds to folklore and linguistic elements.
ReplyDelete
Replies
Nothing FailMay 26, 2018 at 6:03 PM
It is not the Eastern R1a Y DNA which is de facto Slavic but the oldest Y DNA in South East Europe and Europe in general - I2 and in particular I2a1 which formed about 15 000 years ago in the BaLKans - Ukraine region, that is the land of the Thracians. Another BaLKan Y DNA which expanded North and North East is E-v13. Ukraine, considered the homeland of the Slavs from where they went South, is differing from Poland, Belarus and (Great-) Russia (Muscovy) by a lower R1a percentage and higher BaLKan Y DNA ( I2a1 & E-v13). At the same time we must say that the oldest examples of R1a found in BuLGaria ( Varna Necropolis) are older than 8000 years, much earlier than the Kurgan culture of R1a and R1b people of the Black Sea - Caspian Sea steppe (prairie
) region. E-v13 comes at 55% among Albanians, 24% - BuLGarians and 8% among Ukrainians, I2a1 - 21% among both BuLGarians and Ukrainians, R1a - 44% among Ukrainians and 18% among BuLGarians and other BaLKan peoples in general. Linguistically there is a number of words of the "Northern" Slavs, referred by the Muscovites as "Polish", which BuLGarians and other "Southern" (BaLKan) Slavs, and (Great-) Russians (Muscovites), for their language was developed on the base of Church - Sclavonic Old BuLGarian, do not understand, which could be explained by the fact that those in the North have a much higher percentage of R1a (60% in Poland and Muscovy), taht is the "Polish" vocabulary is mostly from the carriers of R1a, while the Muscovites's language was BuLGarized by Church Sclavonic and Ukrainian remained in between, though it sounds much more Polish today but if you open the dictionaries you are to find many BuLGarian words not used by the Polish and the Muscovites.
ReplyDelete
Replies

Add comment

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).

Pages

October 31, 2012

Finally some improved knowledge of haplogroup R1a1 (Y-DNA)

86 comments: