June 14, 2014

Ancient inter-continental admixture in the Horn of Africa

A new and quite interesting study finds strong support for Upper Paleolithic (~ LSA) Eurasian inflows into the Horn of Africa and confirms that most of the populations of that region are in essence an ancient mix of West Eurasian and African ancestries.

Jason A. Hodgson et al., Early Back-to-Africa Migration into the Horn of Africa. PLoS Genetics 2014. Open accessLINK [doi:10.1371/journal.pgen.1004393]

Genetic studies have identified substantial non-African admixture in the Horn of Africa (HOA). In the most recent genomic studies, this non-African ancestry has been attributed to admixture with Middle Eastern populations during the last few thousand years. However, mitochondrial and Y chromosome data are suggestive of earlier episodes of admixture. To investigate this further, we generated new genome-wide SNP data for a Yemeni population sample and merged these new data with published genome-wide genetic data from the HOA and a broad selection of surrounding populations. We used multidimensional scaling and ADMIXTURE methods in an exploratory data analysis to develop hypotheses on admixture and population structure in HOA populations. These analyses suggested that there might be distinct, differentiated African and non-African ancestries in the HOA. After partitioning the SNP data into African and non-African origin chromosome segments, we found support for a distinct African (Ethiopic) ancestry and a distinct non-African (Ethio-Somali) ancestry in HOA populations. The African Ethiopic ancestry is tightly restricted to HOA populations and likely represents an autochthonous HOA population. The non-African ancestry in the HOA, which is primarily attributed to a novel Ethio-Somali inferred ancestry component, is significantly differentiated from all neighboring non-African ancestries in North Africa, the Levant, and Arabia. The Ethio-Somali ancestry is found in all admixed HOA ethnic groups, shows little inter-individual variance within these ethnic groups, is estimated to have diverged from all other non-African ancestries by at least 23 ka, and does not carry the unique Arabian lactase persistence allele that arose about 4 ka. Taking into account published mitochondrial, Y chromosome, paleoclimate, and archaeological data, we find that the time of the Ethio-Somali back-to-Africa migration is most likely pre-agricultural.

The study makes three different formal admixture tests (f3, Adler and D-stat), as well as a Rolloff simulation, in order to confirm these findings. This part is quite technical and therefore I am not going to discuss it further. Feel free to explore the extensive supplemental materials. 

I will instead stop on what I know better, which is ADMIXTURE and FST distances, which are more visually amenable and ultimately tell the same story. 

Figure 2. Population structure of Horn of Africa populations in a broad context.
ADMIXTURE analysis reveals both well-established and novel ancestry components in HOA populations. We used a cross-validation procedure to estimate the best value for the parameter for the number of assigned ancestral populations (K) and found that values from 9 to 14 had the lowest and similar cross-validation errors (Figure S2). (A) The differences in inferred ancestry from K = 9–14 are most pronounced in the HOA for K = 10–12, where two ancestry components that are largely restricted to the HOA appear (the dark purple and dark green components). (B) Surface interpolation of the geographic distribution of eight inferred ancestry components that are relatively unchanging and common to the ADMIXTURE results from K = 10–12. (C) Individual ancestry estimation for HOA populations (with language groups indicated) and surface plots of the changing distributions of the Nilo-Saharan (light blue) and Arabian (brown) ancestry components for K = 10–12. At K = 11, a new HOA-specific ancestry component that we call Ethiopic appears (dark purple) and at K = 12 a second new ancestry component that we call Ethio-Somali (dark green) appears with its highest frequencies in the HOA.

Above we have the original presentation of ADMIXTURE results for K=10-12. It must be said that the cross validation score is lowest (optimal) for K=12 but that this value is only slightly smaller than those for K=9-14, which make a plateau (fig. S2). 

Therefore their use of K=10 and K=11 is justified, particularly because it is also interesting to turn off the old amalgamation reflected in the Ethiopic (Ari, Woloytta) and Ethio-Somali (Cushitic, Ethiopian Semitic) components, and that is done by using K=10 instead of the optimal K=12.

This issue is best perceived in the FST distances table (within text S1), which I include here with some convenient annotations:

The red-orange colored frames (as well as the red notes on the components) in the table above were added by me to better illustrate the meaning of these FST values:
  • The red frames capture two groups of components with very low differences (<50): West Asia-Europe and West-East Africa.
  • The dark orange frames indicate other two groups with quite low distances (<70): South-Central Asian and the West Eurasian core.
  • The lighter orange frames indicate large clusters of middling distances (<125) of continental nature: Eurasian and African. 
  • Intercontinental FST scores are systematically larger, for example European-West African is 176, while European East African ("Nilo-Saharan") is 172, only slightly smaller. 
It is quite apparent that there are three components that overflow these continental boundaries:
  • The so-called Mahgrebi (North African) has some extra affinity with the Ethiopic (Omotic) component, and vice versa. These two components fall otherwise within my approximative continental boxes but they still show lower scores for all the other components of the other "box". This is consistent with their nature as Afro-Eurasian admixed components, each with its own proportions.
  • The Ethio-Somali (Cushitic?) component is actually more intermediate than the previous ones: although its strongest affiliation is towards Eurasia and particularly with the North African and Arabian components, it also shows strong affinity with the core African components (East and West African, i.e. Nilo-Saharan and Niger-Congo). This is consistent with the other evidence in this study that reveals it as an ancient Afro-Asian mix.
I must mention here that some of the labels used by the authors are not at all the ones I would have chosen and this is particularly true re. the Nilo-Saharan (light blue) component, which peaks among the Sandawe (Aboriginal East Africans from Southern Tanzania, speaking a click language), the Anuak (Nilo-Saharan Ethiopians) and the Gumuz (other Ethiopians of quite dubious Nilo-Saharan linguistic affiliation). Hence I prefer to call it East African or East African 1

The authors conclude with the following remarks (emphasis mine):
We find that most of the non-African ancestry in the HOA can be assigned to a distinct non-African origin Ethio-Somali ancestry component, which is found at its highest frequencies in Cushitic and Semitic speaking HOA populations (Table 2, Figure 2). In addition to verifying that most HOA populations have substantial non-African ancestry, which is not controversial [11][14], [16], we argue that the non-African origin Ethio-Somali ancestry in the HOA is most likely pre-agricultural. In combination with the genomic evidence for a pre-agricultural back-to-Africa migration into North Africa [43], [61] and inference of pre-agricultural migrations in and out-of-Africa from mitochondrial and Y chromosome data [13], [32][37], [47], [99][102], these results contribute to a growing body of evidence for migrations of human populations in and out of Africa throughout prehistory [5][7] and suggests that human hunter-gatherer populations were much more dynamic than commonly assumed.

We close with a provisional linguistic hypothesis. The proto-Afro-Asiatic speakers are thought to have lived either in the area of the Levant or in east/northeast Africa [8], [107], [108]. Proponents of the Levantine origin of Afro-Asiatic tie the dispersal and differentiation of this language group to the development of agriculture in the Levant beginning around 12 ka [8], [109], [110]. In the African-origins model, the original diversification of the Afro-Asiatic languages is pre-agricultural, with the source population living in the central Nile valley, the African Red Sea hills, or the HOA [108], [111]. In this model, later diversification and expansion within particular Afro-Asiatic language groups may be associated with agricultural expansions and transmissions, but the deep diversification of the group is pre-agricultural. We hypothesize that a population with substantial Ethio-Somali ancestry could be the proto-Afro-Asiatic speakers. A later migration of a subset of this population back to the Levant before 6 ka would account for a Levantine origin of the Semitic languages [18] and the relatively even distribution of around 7% Ethio-Somali ancestry in all sampled Levantine populations (Table S6). Later migration from Arabia into the HOA beginning around 3 ka would explain the origin of the Ethiosemitic languages at this time [18], the presence of greater Arabian and Eurasian ancestry in the Semitic speaking populations of the HOA (Table 2, S6), and ROLLOFF/ALDER estimates of admixture in HOA populations between 1–5 ka (Table 1).
K=12 detail for a fraction of the Horn of Africa and distribution of the four main components


  1. The genetic distance between the Ethio-Somali and European/West Asian components (31-33) is equivalent to the fst between the former and the three non-hunter-gatherer Sub-Saharan African components (31-34). In addition, the Maghrebi and Arabian components seem to possess non-trivial SSA admixture IMHO. The Ethiopic component seems to be a distinct African cluster, but it is not any closer to the core Western Eurasian components, i.e. European/West Asian, (46-50) than it is to the South or East Asian components (44-46).

    Table 4. Minimum time of divergence of ADMIXTURE inferred ancestry components (ka).

    1. Ka = thousand years.

      The measure in table 4 is a chronological estimate of when the various components diverged from the others. I find extremely difficult to imagine that the Eurasian-Khoisan divergence is of a mere 75 Ka (maybe twice that age actually) but that would be another discussion that I avoided intently.

      But in any case this is not genetic distance but something else based on it (formula ref. Holsinger & Weir 2009). The raw genetic distance between the components must be read in the Fst supplementary table that I included above.

      "The Ethiopic component seems to be a distinct African cluster, but it is not any closer to the core Western Eurasian components"...

      In the Fst table above it is systematically closer to each of the Eurasian components than any other African one is. Of course one can imagine this to reflect the Ari ("Ethiopic") component being directly ancestral to the Eurasian OoA metapopulation rather than admixed and I reckon that I am agnostic on the matter. However if I'd have to bet I'd do in the sense of admixture, because its pattern mirrors that of North Africans, which we know are admixed, and also I doubt that a 125 Ka old component would still be recognizable in such a crossroads as is the Sahel.

    2. I just recalled what the authors say about using three formal tests (f3, D-stat y Alder), whose results are in the supplementary material. The Ari Cultivator (listed as "Cultivator") scores positive in all the three tests.

      The estimated non-African admixture (table S5) is however much lower than among the other admixed populations from the Horn: just 20%. The Wolayta's Eurasian admixture is estimated in 44%, the Oromo in 55% and the other Horners around 65%.

      However these are actual populations, not components and for example the Ari Cultivator are c. 15% in the Ethio-Somali component, what implies that the admixture for the Ari/Ethiopid component is actually smaller, maybe just 10%.

      Whatever the case let's not forget this is very ancient admixture dating to the Paleolithic. Only the Arabian component (brown) should correspond to more recent admixture events.

    3. Wow, there is even more in the supplemental data: text S1, from which I took the Fst table but which I failed to read in full (my bad) argues with formal tests that the Ethio-Somali component is essentially non-African.

      Among these tests there is one that is just the ADMIXTURE apportions with and without the Ethio-Somali component of each actual population. When you remove the ES component from the Ari Cultivators, the non-African admixture collapses to 3%, what is negligible.

      So you are right, Yoho: the Ari/Ethiopid component seems African-specific in the most strict sense. I lost my bet.

  2. According to Text S1, the fst index between the Ethio-Somali and European/West Asian components (9.8, 10.3) is equivalent to the to the fst index between the former and the Ethiopic, Nilo-Saharan, and Niger-Congo components (9.8, 10.3, 10.6); in my opinion, the lower fst values between the Ethio-Somali and Maghrebi/Arabian components could be explained by non-Western Eurasian admixture in the latter given their closer proximity to NE Africa. In retrospect, the Ethio-Somali component clearly isn't a "West Eurasian" component; the component which seems to peak in the Somali is either a) a mixed Ethiopic-Western Eurasian component or b) the illusive "Basel Eurasian" component.

    ps. The Ethiopic component peaks in the Ari-B in SW Ethiopia, a region bordering South Sudan, and is hence not located in the Sahel.

    1. "The Ethiopic component peaks in the Ari-B in SW Ethiopia, a region bordering South Sudan, and is hence not located in the Sahel."

      You're technically right. I was rather thinking in the wider belt of open lands between the Sahara and the jungle, which is much larger, right? For some reason I've been familiarized to associate the word "Sahel" with all the wider belt while it seems that "Sudan" is more correct but nowadays associated to some specific states, what also causes confusion.

      My intention was to mean the open plains corridor between the Sahara and the jungle strip which spans from the Red Sea to the Atlantic via the Nile, the Chad Basin, the Niger and the Senegal, as well as some other areas like the upper parts of the Volga, etc. This may be the main corridor of human flows in Africa since "always", another corridor was of course the East African plateaus to Southern Africa, but in the long term it has been less influential. In between there are the jungle, the Ethiopian highlands and the Sudd, acting as buffer.

      Naturally this natural division, still apparent in some genetic aspects (mtDNA L0 is essentially from the East-Southern primeval region, as are some branches of Y-DNA A), has been largely erased by the expansion of Nilo-Saharan pastoralists first and more dramatically by the Bantu expansion since the Iron Age, but it is still detectable for what it was once upon a time.

      My point was to emphasize that the Sudanese corridor was "open" all the time and subject to all kind of human flows and remixes and therefore it is unlikely that anything from pre-OoA times (other than haploid lineages, which don't suffer recombination) can be spotted nowadays. I may be wrong, especially as we get into the mountains, but I believe it is a correct precautionary stand.

    2. Regarding the Ethio-Somali component, I believe now I'm getting my own judgment confused by a previous "Ethiopian" component that is clearly a different thing. Let me explain:

      In my 2011 attempt at analyzing North African autosomal components, I used Mandenka, Fulani and Ethiopians (HGDP, Amhara?) as Tropical African controls (and also Saudi Arabs and Spaniards as Eurasian controls). The result for Ethiopians was that they behaved, in essence, as a 1:2 mix of Arabian and Mandenka... but only until K=9. From K=10 onwards Ethiopians developed their own component which took over 95% of the sample's ancestry, but this component was tellingly almost equidistant in Fst distances between Eurasian/North African and Tropical African ones.

      So I concluded that the "Ethiopian" component in that analysis was a locally long-curated amalgamation of Eurasian (Arab in essence) and African origins. This happens: even if the ancestries are very distinct, as would be this case, in the long run they blend into a single "stock" because of continuous recombination within the same population (it also depends on historical population size: smaller endogamous pops. would do that faster than larger more diverse ones).

      But I realize now that my 2011 "Ethiopian" component and this Ethio-Somali component are different things. This Ethio-Somali component seems to approximate better the Eurasian element in my analysis before the "Ethiopian" blend showed up as unique.

      But then again I would not exclude that this Ethio-Somali component is somewhat mixed with some strictly African ancestry since very old times because of its greater distance re. Eurasian components and smaller distance re. African components.

      This may well be a process affecting also the Mahgrebi (North African) component and also the Arabian one and it may well be difficult to discern in detail because ancient populations that do not exist anymore as such, like North African "Aterians", Arabian populations from the first stage of the "Out of Africa" migration and other ancient groups from Africa itself that are not anymore distinguishable almost surely left a legacy that blurs the Africa-Eurasia distinction to some extent.

      I believe that I could detect some of such OoA time residuals in NW Africa (mostly Southern Morocco) and the Red Sea area (Saudis and one of the two Egyptian samples, see here) and I would imagine that there are other similarly old "hidden" components scattered through all NE and East Africa.

      But discerning them with clarity probably needs a more sophisticated work than I am able to perform.

      In any case it seems clear from this study that the Ethio-Somali component is at least largely of Eurasian derivation.


    3. ...

      However you are right re. the Fst distances: the Ethio-Somali component clusters with Eurasians, yes, but only weakly, suggesting ancient admixture. So there may be indeed more than this study can find and assuming this component to be purely or mostly "Eurasian" may be an error.

      Of course there is something (a quite sizable something) Eurasian in that area, as evidenced by haploid genetics (Y-DNA T, J1, various mtDNA lineages) but taking the Ethio-Somali component as "purely Eurasian" may be problematic and a bit over-simplifying. For example when we look at the f3 admixture bounds for the Amhara, these are lower than the overall estimated admixture, which seems derived from a f4 test that I can't easily find (seems the D-stat test but this one doesn't list the bounds, only the median D score), so I think that there room for more nuanced interpretations of this component as partly African-derived.

      Judging on the K=10, the overall Eurasian element in mainline Ethiopian pops. like the Amhara may well be smaller, c. 55% (visual estimate). When we move to the K=11, the explicit Eurasian element adds up to 28% and the ES component to 34%.

      55-28=17; 34-17=17.

      That would imply that the Ethio-Somali component is ~50% non-Eurasian (African or something similar like "Aboriginal Arabian" from OoA times) but this is a rough estimate.

      Interesting discussion in any case.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).