October 31, 2012

Finally some improved knowledge of haplogroup R1a1 (Y-DNA)

Haplogroup R1a, most of which is R1a1, dominant in Northern South Asia and Eastern Europe, as well as in much of Central Asia, has been giving headaches to population geneticists, academic and amateur alike, because key markers were not identified, making most of the haplogroup look like an amorphous goo, the same in India as in Europe. It seems that this may change now:

Horolma Pamjav et al., Brief communication: New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1. AJPA 2012. Pay per view ··> LINK [10.1002/ajpa.22167]

Abstract


Haplogroup R1a1-M198 is a major clade of Y chromosomal haplogroups which is distributed all across Eurasia. To this date, many efforts have been made to identify large SNP-based subgroups and migration patterns of this haplogroup. The origin and spread of R1a1 chromosomes in Eurasia has, however, remained unknown due to the lack of downstream SNPs within the R1a1 haplogroup. Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.
The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.

Not having access to the paper right now, I can't say much more but I believe that the abstract alone is very informative already.

Distribution of R1a per Underhill 2010
 


Update:

Fig. 1 - MJ trees
(click to expand)
A reader already sent me a copy of the paper and I think that it has two aspects:

On one side the paper effectively detects these markers and study them, as well as R-M458 in Hungarians and related ethnic groups (Csangos, Szeklers, Hungarian Roma), as well as in Malaysian Indians, Uzbeks and Mongols. This part is informative, even if the selected Asian populations may not be the best choice (Mongols are low in R1a and so are Tamils who make up the bulk of Malaysian Indians).

On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe.

Even using Underhill age estimates, they'd imply at least LGM dates for the arrival to Europe after the due correction. Their own dates, after due x2 correction, give Late Upper Paleolithic dates for the haplogroups researched here. 

Also the authors insist on arguing against a South Asian origin of R1a1 (Underhill 2010) on what sound like weak and fallacious arguments:

Previous publications have pointed out that regions of highest haplogroup frequencies do not always indicate the territory of origin (Cinnioglu et al., 2004) and high STR diversity may not be exclusively an indicator of in-situ diversification but could also be the consequence of repeated gene flow from different sources (Zerjal et al., 2002; Sharma et al., 2009).

Basically they are nagging: "Underhill could hypothetically be wrong in his conclusions but we have no evidence whatsoever that he is - just saying". 

The real reason is that they seem to hope to find a more westerly origin for the lineage and attribute it again to Indoeuropean expansions, in line with classic speculations for which the high South Asian STR diversity levels are a big problem. However it is most unlikely that a bunch of horse-riding nomads could so radically alter the genetic landscape of the whole subcontinent, more so when its agriculture was already fully developed, sustaining no doubt high densities. 

But notwithstanding all those highly questionable opinions, the discovery of new haplogroups adding to our comprehension of this major lineage is a great advance.


Update: 

It seems that some of the data exposed in this paper was already floating around in some circles because ISOGG already includes the "new" haplogroups in its phylogenetic synthesis. Most interestingly the two "European" clades (along with a third one, whose geography I ignore so far) make up a larger haplogroup (R1a1a1b1a - S198/Z282), which is "brother" of the "Asian" one (R1a1a1b2 - S202/Z93).

As I was just commenting elsewhere the key to the origins of R1a is not so much in these low level haplogroups but in the higher "asterisk" paragroup, which (from memory) used to be concentrated in Pakistan and nearby areas of India, etc.

But once reached the level of R1a1a1b1 (S339/Z283), this lineage seems to have split in two: one which we can describe as "European" and another which we can describe as "Indian".

The European half is treated in this paper as two of its subclades only and separately, what may be confuse. Hence I am adding here a synthesis of the current ISOGG phylogeny of R1a, with some annotations, for easier reference:

  • R1a* ··> Iran, Persian Gulf, Turkey
  • R1a1  (L120/M516, L122/M448, M459, Page65.2/SRY1532.2/SRY10831.2)
    • R1a1* ··> Iran, Caucasus, Greece, Scandinavia
    • R1a1a (L168, L449, M17, M198, M512, M514, M515)
      •  R1a1a* ··> where? (not clear)
      •  R1a1a1 (M417, Page7)
        • R1a1a1* ··> where?
        • R1a1a1a (L664/S298)  ··> where?
        • R1a1a1b (S224/Z645, S441/Z647)
          • R1a1a1b* ··> where?
          • R1a1a1b1 (S339/Z283)
            • R1a1a1b1* ··> where?
            • R1a1a1b1a (S198/Z282)
              • R1a1a1b1a*
              • R1a1a1b1a1 (M458) ··> Central & East Europe
              • R1a1a1b1a2 (S204/Z91, S466/Z280) ··> Europe, Central Asia
              • R1a1a1b1a3 (S221/Z284, S443/Z289) ··> where?
            •  R1a1a1b2  (S202/Z93) ··> India, Central Asia

All the data on the geography of top level "asterisk" paragroups is from Underhill 2010, already mentioned above. It suggest a West Asian origin for R1a overall and spread to West and East since the R1a1a level or lower.

I used colors to emphasize the clades discussed here (purple for the larger haplogroup, blue for the European-leaning clade and red for the Indian-leaning one).

Clades in cursive are "proposed", not yet consolidated.

22 comments:

  1. Maju you're full of it, as usual.

    Europe has much higher R1a SNP diversity than South Asia, where only Z93 is found.

    Z280, M458 and ancestral lineages of Z93 all overlap in Europe. Not even in Inner or Central Asia as this paper claims.

    Therefore, R1a expanded deep into Asia from somewhere in Europe.

    Here's the latest tree you clown.

    http://img543.imageshack.us/img543/4425/r1am198.jpg

    ReplyDelete
    Replies
    1. I'd appreciate if you tone down your comments a bit.

      Anyhow, what percentage of South Asian R1a belongs to the sub-haplogroups that we know are mostly from that region? Because research bias is important in discerning subhaplogroups existence altogether (you know or should know that well).

      IF you can show me as a matter of fact that most South Asian R1a or R1a1 belongs to the haplogroups already discovered, I would be the first to acknowledge because, when research is throughout, basal haplogroup diversity should be most informative, well above STR structure or estimated diversity.

      So far I don't see that and instead there have been two papers (Underhill's and another one which had found that the deepest apparent R1a sublineages were from South Asia as well - can't recall the author right now but I'm sure you will).

      I understand that R1a expanded in the Upper Paleolithic, the same as relatives R1b and Q, in the context of the colonization of West Eurasia (including Central Asia). But prove me wrong, of course.

      I believe we already discussed that, in parts of Europe, it looks like R1a-M458 could be related to Kurgan expansion (Corded Ware specifically), specially because it's found at higher frequencies in Greek Macedonia than in all the Slavic Balcans, what make it a very unlikely "Slavic" clade.

      Anyhow, how can be a lineage that allegedly expanded c. 2500 BCE a "Slavic marker" when the Slavic expansion only happened around 700 CE, 3200 years later? You should try to be more internally consistent at the very least.

      Delete
    2. There's nothing but Z93 (L342+) in South Asia (India + Pakistan). There's definitely no confirmed R1a* there at this stage.

      Europe has Z283 (which includes M458, Z280 and Z284), L664, Z93 (L342-), and R1a*.

      So you would need a major shift in results to back up what you'd like to see.

      I think the theory that R1a originated in South Asia is a dud, and it's only around due to the fact that STR diversity was wrongly assumed to mean anything in this context. R1a most likely comes from West Asia, and R1a-M417 from Europe. That probably means Z93 also comes from Europe.

      Delete
    3. Actually it was easier than I thought to find the top level paragroups data, because Underhill 2010 already worked that (see supp. tables).

      R1a* is only found in West Asia (Iran, Turkey, Gulf emirates).

      R1a1* is found also in Iran and, additionally in Caucasus, Greece and Scandinavia.

      R1a1a* is widespread (from India to Europe).

      So it'd seem that the expansion happened at the R1a1a level and also (maybe at the same time) at the immediately lower levels. But I see no signature of IE expansion in it, sincerely.

      Delete
    4. Correction: R1a1a* in Underhill 2010 is probably not R1a1a(xR1a1a1) but some other category. Whatever the case R1a obviously originated in West Asia from where it spread to Europe and South Asia. I see no data supporting a Europe-to-India pattern, sorry but a radial from West Asia one instead. You could argue for a Neolithic origin, I guess.

      Delete
    5. Don't worry, you'll get it eventually when more stuff comes out. No point arguing about it now.

      Delete
    6. I worry that you state claims without evidence. It's not just annoying but says little about the quality of your assessment.

      Delete
  2. It must be from the Punjabi diaspora in Malasia

    ReplyDelete
    Replies
    1. It must be, it is in fact a random sample of Indians from Malaysia taken from the 1000 genomes project, who are 85% Tamil.

      I have no idea why they did not use other 1000 genomes' samples like Gujaratis from Houston (GIH) or the several Pakistani ones but worry not someone will and soon. I just don't have the means/know-how but the data is there stored in public access database and is all about someone checking it.

      Delete
    2. The GIH have been checked. They're Z93+.

      All the 1000 Genomes samples have been checked. That's how many of the new SNPs have been found.

      Delete
  3. could you please send me a link to the site.

    ReplyDelete
  4. my email is clusteredmaps@aol.com
    and I could put the site on the blog clusteredmaps.blogspot.com

    ReplyDelete
    Replies
    1. Uh? The site? What site?

      Your blog has not updated in two years incidentally.

      Delete
    2. THat is because I need new info, in addition to this I have R1a so I would really appreciate the data.

      Delete
    3. What I don't understand is the nature of your request. Do you want a copy of the paper or what?

      Delete
    4. Just a copy of the PDF paper or the infortion about the frequency of R1a1 in all of the popualtions studied, a list of the popualtions studied and the frequency of the R1a in all populations the supplementary data is sketchy. For example who is 728/05 in http://onlinelibrary.wiley.com/store/10.1002/ajpa.22167/asset/supinfo/AJPA_22167_sm_SuppTab2.xls?v=1&s=86bbd2fd32edea0a235d3ff2ff3ff87ac873d07d

      Delete
  5. "R1a1a1a (L664/S298) ··> where?"

    Northwestern Europe.

    "On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe."

    You seem pretty confused. Pedigree estimates of autosomal SNP mutation rates have nothing to do Y STR mutation rates, which have their own pedigree estimates (leading to dates much younger than with Zhivotovsky).

    ReplyDelete
  6. "R1a1a1a (L664/S298) ··> where?"

    Northwestern Europe.

    "On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe."

    Pedigree estimates of autosomal SNP mutation rates have nothing to do Y STR mutation rates, which have their own pedigree estimates (leading to dates much younger than with Zhivotovsky).

    ReplyDelete
    Replies
    1. "Northwestern Europe".

      Why? Source?

      Also, only there? How can we know?

      ...

      As for the other aspect, are you saying that the authors counted every single SNP in the Y chromosome (of some people) and are not using anymore STR-based estimates?

      And, if so, which is their calibration point, because the SNP-based estimates I have seen have totally wrong calibration points usually, like attributing 50 Ka to the OoA when it is 125-80 Ka - depending what part of the episode: 80 Ka would be for F or CF, already in South Asia. If you calibrate the B/CF'DE at 50 Ka when it must be before 125 Ka, you get the wrong results also.

      But it would be nice to clarify if they are counting all SNPs in some individual lines, something that is indeed possible nowadays, and, if so, which is their calibration reference.

      Delete
  7. I am an Indian with r1a1 as determined by the National Geographic Genographic project, so the comment above by davidski is weird.

    ReplyDelete
    Replies
    1. Hi Ash, but have you been tested downstream of R1a1, for example for M17 or Z39?

      Genographic Project, AFAIK, is an old (and a bit "pop") reference anyhow, for more than a decade now the Y-DNA quasi-official reference for nomenclature has been YSOGG, which is kept up to date to the latest genetic research (or almost).

      What is exactly what you find "weird" in Davidski's comment?

      Delete

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... temporarily OFF (help it to stay that way, thanks).