March 28, 2012

Y-DNA from Afghanistan

Hazaras (source)
Afghanistan was one of those potentially key crossroads with only indirect sampling, mostly via ethnic relatives from Pakistan. Therefore we must welcome with a great applause the following paper, which fills a gap in our knowledge (next Burma, please):


Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.

Fig. 1 - PCA derived from Y-chromosomal haplogroup frequencies

In my understanding the really interesting stuff is in the supplemental table 4, which lists all the tested haplogroups for the Afghan samples.

Large and medium samples (n>10) simplified (only largest haplogroups):
  • Hazara (n=60): 20 C3 (33%), 10 J2a* (17%), 6 J2a5 (10%), 4 R1a1a (7%), 3 B (5%), 3 E1b1b1c1 (5%),
  • Tajik (n=56): 17 R1a1a (30%) 9 J2a (14%), 5 O (9%), 3 H1a (5%)
  • Pashtun (n=49): 25 R1a1a (51%), 9 Q (18%), 6 L1c (12%), 3 G2c(6%)
  • Uzbek (n=17): 7 C3 (41%), 3 R1a1a (18%), 2 R1b1a2 (12%)
  • Baluch (n=13): 8 L1a (61%), 2 R2a (15%)
Small and tiny samples (n<10):
  • Norestani (n=5): 3 R1a1a, 1 R2a, 1 J2a*
  • Arab (n=3): 2 L1a, 1 R2a
  • Turkmen (n=1): 1 R1a1a

Hazara Y-DNA oddities (B and M1)

The Hazara Country (source) is the center of Afghanistan
I must say that what stroke me the most were the three Y-DNA B Hazaras. This is a lineage almost unreported in Eurasia and much less in a population that shows no other signs of African admixture. 

Supplementary table 1 lists all haplotypes and the three Y-DNA B Hazaras (two from Bamiyan and one from Ghor) have some differences: they are not recent relatives by patrilineage. Whenever the African lineage arrived to the area, it had since then some time to evolve and diverge locally.

Are we before yet another puzzling Out-of-Africa remnant like the East Asian Y-DNA DE (mostly D)? Or is something more recently arrived? If so, how did it reach such high frequencies among the Hazara (and only them)?

The Hazara sample also includes an individual with Y-DNA M1, which is in principle a Melanesian lineage, i.e. another haplogroup which should not be there, but this one from the opposite corner of the Old World.

Dominant lineages

Otherwise it seems evident that Y-DNA R1a1a dominates among Indoeuropean speakers (Pashtun, Tajik and Noristani), C3 among the Uzbek and Hazara and L1a among the Baluch and "Arab" (who seem identical to the Baluch).

J2a (maybe a Neolithic layer) is also important among Tayik and Hazaras, while Q is very important among Pashtuns (Q is most basally diverse in West Asia, in case you do not know, even if it is most frequent among Native Americans).


  2. It has to sample error on the B and M1 part. The strange thing is I also came across haplogroup O3 in Spain which is really weird. It's really strange that R1b1b1-M73 it's found in high frequencies in this 2006 study Sengupta et al. (2006) with an frequency of 32% but it's found 0% in this study of yours. Also Haplogroup O3 another marker that came from Mongols it's not on your study but it's 8% in this Hazara study. I don't know about the other groups, but something is definitely wrong with this Hazara study.

    Sengupta et al. (2006),
    10/25 = 40% C3-M217
    1/25 = 4% I2b1b-M379
    1/25 = 4% J2a-M410
    2/25 = 8% O3-M122
    1/25 = 4% Q1a1-M120
    1/25 = 4% Q1b-M378
    8/25 = 32% R1b1b1-M73
    1/25 = 4% R2-M124

    1. O3 in Spain should not be so strange in very small amounts or in specific locations. It can be Filipino or even Japanese. There was in the 16th century a formal Japanese embassy near Seville, at Coria del Río, and apparently they left desdendants among the people whose surname is "Japón", Japan in Spanish (see the Spanish Wikipedia).

      As for the Hazara, we can't 100% discard an error but it may also be correct. I don't feel able to judge.

    2. The Japanese embassy episode is from the earliest 17th century in fact. There's an article in English:

    3. That may explain the presence of haplogroup O3 in Spain. But how do we explain the disappearance of haplogroup R1b1b1-M73 in this new study, as shown from the last study it had an high 32% frequency but this new study shows completely 0%.

      Haplogroup E1b1b1c1 is another strange marker not found anywhere in afghan ethnic accept in the Hazara, but it most likely originated from the Ashkenazi jews who also has 10% of E1b1b1c1, this may also explain it's presence of Q1b in Hazara which is found in 5% in Ashkenazi jews. The Sephardic Jews also have 10% frequency of E1b1b1c1, but it's highest frequencies is in area of dead sea Jordan at 31.1%. It's also found 5% in Turkey, 8.1% in Oman, 10% Galarcia/Spain. It's also at about 11% - 15% in Ethiopian Jews, I was thinking maybe this B marker came from Ethiopia jews but normal Ethiopians only has 3.9% - 4.6%, and haplogroup haplogrou B it's non-existant in Ethiopian Jews and even normal Ethiopian has ussually only 2.3% - 5.4%

      -----> LINK Y-DNA haplogroup of Ethiopians

      Hazara has no African mtDNA but has an high frequency of 65% West Eurasian/South India mtDNA and 35% East Eurasian mtDNA. But in Pakistan an ethnic group called the Siddi who are mixture of African and local Pakistani. They have E3a at a frequency distribution of only 5% but mtDNA analysis reveals approx 40% L1a, L2a, L2b, L2d. But even they have no presence of haplogroup B or M1.

    4. Sorry, I have no good explanation.

      My first thought was to imagine them as anomalous but somehow fitting fossils of the Eurasian colonization process: I paralleled Hazara B with East Asian DE (D) and Hazara M1 with its South/Central/West Eurasian relative under MNOPS (P and its subclades Q and R).

      But of course it can be an error or can have other explanations.

      I am imagining anyhow that Snegupta's Hazaras are not from Afghanistan but from Pakistan, right? For long most of the info about Afghan ethnicities came from their Pakistani relatives. In fact I think that this is the first Y-DNA study of Afghans as such - correct me if I'm wrong.

    5. Maju is correct. There are two distinct Hazara people. Hazaras in Pakistan are a multitude of different ethnicties (Afghan speakers, Hindko, Turkic) the Afghan Hazaras who are distinctly Mongolian-Turkic derived.

      African descent may be uncommon, but not implausible. Afghanistan was a major trade route for goods and also people. There was a popular slave trade in Central Asia and there could have been some implants either from India or the Middle East.

    6. African descent is maybe plausible (there are communities and individuals with recent African ancestry in Pakistan, the Persian Gulf or even India) but the lineage in question is anomalous and too diverse to be a recent founder effect.

      Haplogroup B is found concentrated in Central Africa and is most common among Pygmies, being rather rare in other populations (see for example the maps of frequency at Chiaroni 2009). It is hard to explain how B has caused a small founder effect with relatively high diversity in Afghanistan and with it no E(xE1b1b1), the dominant group of lineages in Africa South of the Sahara, arrived from the same zone.

      However more recently, Y-DNA B has been found at low frequencies also in Saudi Arabia and, at higher frequencies, at Hormuzgan (Iran, the area of the historical trading city of Hormuz). Interestingly Afro-Iranians from Hormuz have nothing of it, and instead other populations have it at 2.3% (Bandari) and 8.2% (Gheshmi).

      The proportions in these three Hormuz populations are effectively inverse for B and E1b1a1, which dominates (25%) the Afro-Iranians. This really suggests that the arrival of Y-DNA B is unrelated to the arrival of Afrodescendants with the Indian Ocean slave (and other) trade. I am guessing that it might have arrived with traders from what is now Sudan, maybe even in pre-Islamic times (Nubia) - but hard to say.


      Haplogroup B in Hazara are from Iranian men. That's because Hazara today are indeed mixture of Mongolians and Iranians. It's even more obvious when you compared their Y-DNA chromosome. See how J1, J2, R1a1a, E1b1b1c1 are all in Iranians people in significant frequencies especially J?

      Haplogroup B Y-DNA has been found in 3% (3/117) in Iranian aswell. 2.3% (Bandari) and 8.2% (Gheshmi). Iranians also have African mtDNA from 3% L3 ON average but higher than 8% in some Iran province. Most Iran province ussually have only 3% African Y-DNA and mtDNA.

      When Arabs conquered Iran in 6th century which was 700 years way before the creation of the Mongol empire. Obviously Iranian already mixed African slaves but 700 years later I doubt they would have looked anything like African but like modern day Iranians.


      Like I said, let's not forget Hazara are mixture of Mongolians and Iranians.

      Hazara have more mongoloid Y-DNA than Mongoloid mtDNA.
      Hazara have more Caucasian mtDNA than Caucasoid Y-DNA
      But at the same time they have more Caucasoid Y-DNA it's just caucasoid mtDNA is 20% higher

      Apart from C3, O3, Q, N ect . Haplogroup C3 is highest because it's from the Mongols but the 2nd highest Haplogroup J is the highest in Iranians, E1b1b1c1 and R1a which is present in moderate frequencies in Hazara just like significantly in Iranians

    8. It's not just any Iranians, much less all Iranians, it's just those from Hormuz without previously known African ancestry (i.e. not Afro-Iranians).

      I must insist on that because to my eyes it does look like Hormuz could be at the origin but the origin could be recent Indian Ocean trade routs (not likely), more ancient ones (maybe from the time when Nubia was quite prosperous) or even a prehistoric residue (??) Only careful analysis of the haplotypes can give some answers, if at all.

      It is important to notice that the more B a population has in Hormuz, the less of other more typical African lineages (E1b1a), so it's still anomalous and in need of a good explanation.

    9. I want to mention something about Hazaras. Firstly, 32% R1b1b1-M73 come Hazaras of Pakistan. They migrated from Southern most part of Hazarajat named Ghazni or exactly Jaghory tribe about 100 years ago to Pakistan. The tribe is one of the largest group of all Hazaras with more mixed features with nearby Pashton or Tajik people and by the migration, this mixing even more evolved. So, no surprising this results for me as a Hazara. Unfortunately, the new study missed this large group of Jaghori or Ghazni Hazaras and only very limited samples have been taken.
      Abut the E and J haplogroups, it is important to consider the direct and historically important religiously relation of hazaras and Iranian, both Persian speaking Shia Muslims. The only local who could inter the the community were Shias mostly religious elders and clerics or Turkik people named Qizilbash. About strange B group also important to note that 3 samples found in the western hazara tribe named Day-zangi, which zangi in Persian means Black, but they are not black people nowadays!

    10. Hasan: this study is focused in Afghanistan and AFAIK the first one sampling inside this country, so it's only logical that they ignored these groups from Pakistan. There are plenty of genetic studies about Pakistan, including Pashtuns and Hazaras (not sure if that specific group you mention).

      These pops. from Pakistan are shown in some graphics like the PCA, where both Hazara populations are rather close to each other, although it's true that Afghan Hazaras (4) are more outliers than Pakistani ones (18).

      About the "strange" haplogroup B it was later spotted in important frequencies Hormozgan (Iran), so that's probably the direct source. How it reached to Hormuz is another story.

      See: →

  4. Khazar > Hazara Too obvious? Some Hebrew genes mixed with Asiatic in the Khazar Empire, in the silk road trade c.650-1,000CE. Some clues/connections may come from dispersed 'bukharian jews' (, who preserve much of the old culture such as a language cross between Dari Persian and Turkoman.

    1. The sound coincidence may be just that: a coincidence. The Khazars anyhow were not Jews by ancestry but Turkic converts. I'm not sure where do you see the "Hebrew lineages" either.


