March 16, 2016

South Asian autosomal structure

A recent study finds "five" components, although in practice they can be reduced to three.

Analabha Basu et al., Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. PNAS 2015. Freely accessibleLINK [doi: 10.1073/pnas.1513197113]


India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.

One of the components, very distant from the rest, is the Andamanese one (Jarawa, Onge), but the isolated islands are not really in South Asia, rather in SE Asia (south of Myanmar, belonging to India only because of historical accident), what reduces the structure of South Asia to what we can see in the following graph:

Fig. 2.
(A) Scatterplot of 331 individuals from 18 mainland Indian populations by the first two PCs extracted from genome-wide genotype data. Four distinct clines and clusters were noted; these are encircled using four colors. (B) Estimates of ancestral components of 331 individuals from 18 mainland Indian populations. A model with four ancestral components (K = 4) was the most parsimonious to explain the variation and similarities of the genome-wide genotype data on the 331 individuals. Each individual is represented by a vertical line partitioned into colored segments whose lengths are proportional to the contributions of the ancestral components to the genome of the individual. Population labels were added only after each individual’s ancestry had been estimated. We have used green and red to represent ANI and ASI ancestries; and cyan and blue with the inferred AAA and ATB ancestries. These colors correspond to the colors used to encircle clusters of individuals in A. (Also see SI Appendix, Figs. S2 and S3.)

It is quite apparent that the AAA (Ancient Austroasiatic) component behaves as the ASI (Ancient South Indian) one but with a tendency towards the ATB (Ancient Tibeto-Burman) one, strongly suggesting it is basically product of admixture and not a truly autonomous ancestral component. 

This may be more apparent in the wider pan-Asian context:

Fig. 3.
Approximate “mirroring” of genes and geography. Genomic variation of individuals, represented by the first two PCs, sampled from 18 mainland Indians combined with the CS-Asians) and E-Asians from HGDP, compared with the map of the Indian subcontinent showing the approximate locations from which the individuals and populations were sampled.

In this wider mapping (would be even more clear if West Asian populations were included), we see that:
  1. ANI (Ancient North Indian) strongly tends to the West. In other analyses it is very similar to the Caucasus modal component and therefore a logical conclusion is that we are before a Neolithic immigrant element, much as happens in Europe.
  2. ATB (Ancient Tibeto-Burman) strongly tends to the East, more specifically SE Asia, and is therefore the reverse to ANI, although much less influential.
  3. ASI (Ancient South Indian) is the true aboriginal (pre-Neolithic) component of India, better preserved in southern populations but more clinal than the sample choice allows us to perceive.
  4. AAA (Ancient Austroasiatic) is very similar to ASI but has some SE Asian admixture, as is logical to expect, being Austroasiatic a SE Asian language of likely Neolithic expansiveness. 
So ASI and AAA are basically the same thing and that's why I say that the "five" components can be simplified to just three. Said that, it is indeed possible that there is underlying complexity within the ASI+AAA component but this study does not help us to clarify that. 

It is true that the K=4 (after exclusion of Andamanese, K=5 with them) fits the parsimony criterion best but the K=3 is also a good fit and shows AAA exactly as I describe them: largely ASI ("aboriginal") with a significant ATB (Eastern) component. The AAA component can therefore be perceived as consolidated, homogenized, ancient admixture. Prove me wrong on this and I'll eat my words. 

Caste apartheid stopped genetic flow

Quite interestingly, the authors also dwell on how the admixture process was stopped by the Gupta laws (Middle Ages) that imposed apartheid (caste system) enforced endogamy and caused the now apparent genetic isolation of the multiple groups.

We have provided evidence that gene flow ended abruptly with the defining imposition of some social values and norms. The reign of the ardent Hindu Gupta rulers, known as the age of Vedic Brahminism, was marked by strictures laid down in Dharmaśāstra—the ancient compendium of moral laws and principles for religious duty and righteous conduct to be followed by a Hindu—and enforced through the powerful state machinery of a developing political economy (15). These strictures and enforcements resulted in a shift to endogamy. The evidence of more recent admixture among the Maratha (MRT) is in agreement with the known history of the post-Gupta Chalukya (543–753 CE) and the Rashtrakuta empires (753–982 CE) of western India, which established a clan of warriors (Kshatriyas) drawn from the local peasantry (15). In eastern and northeastern India, populations such as the West Bengal Brahmins (WBR) and the TB populations continued to admix until the emergence of the Buddhist Pala dynasty during the 8th to 12th centuries CE. The asymmetry of admixture, with ANI populations providing genomic inputs to tribal populations (AA, Dravidian tribe, and TB) but not vice versa, is consistent with elite dominance and patriarchy. Males from dominant populations, possibly upper castes, with high ANI component, mated outside of their caste, but their offspring were not allowed to be inducted into the caste. This phenomenon has been previously observed as asymmetry in homogeneity of mtDNA and heterogeneity of Y-chromosomal haplotypes in tribal populations of India (6) as well as the African Americans in United States (34). In this study, we noted that, although there are subtle sex-specific differences in admixture proportions, there are no major differences in inferences about population relationships and peopling whether X-chromosomal or autosomal data are used. We have also found our inferences to become more robust when our data are jointly analyzed with HGDP data.

I can't but find quite curious how, once again, Indian and European histories behave so similarly: in Europe also a simpler but also "god-sanctioned" caste system (designed by Agustin of Hippo) was imposed upon the collapse of the Roman Empire (very similar dates). However popular revolutions gradually but systematically destroyed it. The same is happening in India now but with a delayed timeline. Instead Muslim West Asia (and surroundings) had no caste system and that's probably why it was so successful back in the day: because it allowed relatively more freedom and intellectual pursuit than other neighboring social systems. Of course, this stopped being the case after the Mongol conquests, roughly coincident with European Renaissance, when Islam cocooned itself into reactionary mode, leading to stagnation and eventually to colonial subservience.


  1. I'm a layman interested in this issue and I was wondering if you could answer some questions for me.

    Does every Indian have some ATB ancestry like every Indian has ANI and ASI ancestry? And does ATB ancestry follow a geographical distribution like ANI and ASI does? If so what is that distribustion?

    1. According to the data produced by the algorithm ADMIXTURE¹, partly visible in the fig. 2 reproduced above (click on it to expand), the ATB component is rare outside Tibeto-Burman and (to lesser extent) Austro-Asiatic populations. Only WBR (some subgroup of Brahmins, not sure right now what the "W" stands for: "Western" maybe?) of all the Indo-Aryan and Dravidian populations have it in very small frequencies.

      This should not be unexpected: in principle Austroasiatics (their language and ethno-cultural identity, genetically they are mostly native) arrived with "rice Neolithic", while the second oriental wave, the Tibeto-Burman one is much more recent, spreading mostly through the southern Himalayan mountain areas. Unlike the Austroasiatic wave, which seems to imply assimilation of native Indians to a very large extent, the TB one seems a wave of colonists, with much lesser native admixture, almost none in some cases. This is also apparent in physical aspect: TB populations almost invariably look East Asian, while AA ones look just typical South Asian instead.

      ¹ Always approximative: autosomal genetics are a massive amount of data, with minor variations between individuals (because sexual reproduction produces dramatic complexity, what helps us to avoid illnesses and make sure that someone is almost always able to survive catastrophes, that's why diversity is good) and must hence be analyzed by statistical methods, invariably subject to some degree of error. ADMIXTURE is a good, well tested, algorithm but nothing is perfect, much less simple, and different parameters (such as a different sampling strategy) may well produce different results (or not).

    2. So does that mean that AAA component is much more widespread among Indians?

    3. Yes, of course: it's much more common and intense.

      However I'm questioning in this that the 4 component model actually expresses well the ancient Indian "melting pot" and therefore I suggest to everyone interested to also look at the, quite valid, 3 component model available in the supplemental materials. Reading the issue this way, the AAA component does not exist and is merely a mixture of ASI and ATB, a consolidated old one but but not a fully autonomous component, although then ATB does not represent the Tibeto-Burmans only in any case but all the East Asian influences, just as ANI represents the West Eurasian ones.

      We find this simpler 3-way formula in Fig. Supplement(i) at K=4 (the fourth component being the Andamanese and hence equivalent to K=3 without them), warning: color coding is different, with cyan color representing ANI.

      There we see that the blue component (ATB or generic East Asian) is present in Tibeto-Burmans and Austroasiatic (as expected): dominant in the first, minor in the latter. Of all the rest, only WBR shows the blue East Asian component at very low frequencies. So mainline Indians in general do not carry East Asian ancestry (or tiny at most), being a blend of red component (aboriginal South Asian, best represented in this sample by PNY: Paniya) and cyan (West Eurasian inputs or more precisely the equivalent South Asian admixed result, represented by KSH: Kshatriya, who should actually look as a mix of Caucasus and Aboriginal Indian/ASI in a wider sample including West Asian controls). Instead Austroasiatic peoples appear in the K=3 model as a mix of ANI and ATB (Aboriginal Indian and East Asian respectively), with only limited cyan component (except GND, where it is more important).

      I wish that this same analysis would have been done with West Asian controls: it would almost certainly have produced a best fit at K=3. It can be done at home with the 1000 Genomes or other similar dataset but I just don't have the energies nor the interest, it has been done before by others in any case.

    4. What I mean in few words is that the AAA component is masking the ASI one when found with East Asian ("ATB") ancient admixture. So I would not pay too much attention to it.

  2. Luis... wake up!

    Sent and email. Have a great read to you. My thesis on Shulaveri-Shomu been the origin of the R1b in Europe and the proto-culture of the Bell Beakers. Remember?
    Just download it from here:

    1. I got your email yesterday, Olympus. I just can't cope emotionally with life itself these days, months, whatever... so I'll reply to you soon[TM].

  3. Luis, Life might not be simple... but its really what we make of it. Make it easier! Not all of it, naturally, but a great part of the problem its really in your mind. -- Get out there and fight.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).