October 11, 2018

Major Guanche genetic influence in Puerto Ricans (guest article by Thierno)

Guest article by Thierno

A discussion on a study on Caribbean autosomal ancestry from 2013 by Andrés Moreno Estrada et al., "Reconstructing the Population Genetic History of the Caribbean,” was posted on this blog:

There were two important elements of information to consider from said post.

1) The ADMIXTURE graphs displayed a "black" component, largely found in Caribbean Admixed Latinos but only poorly represented in South Europe, which suggested a "recent" founder effect some 500 years ago. [Note: "black" here refers to color coding of an autosomal component in Moreno 2013, not to Tropical African ancestry, that was color-coded as "green", please follow the link above for more details].

2) An interesting and informative discussion between a Puerto Rican named Charles, in search of his ancestry, and another blogger named Maju shed light on the little-known historical contribution of Canarian aboriginal Guanches (Berber) to the colonization of America. It is often referred to as the "Tributo de Sangre" (Blood Tribute).

They concluded that the "black" component which was displayed on the ADMIXTURE graphs of the study most likely had a North African origin, by way of Canarian aboriginal Guanche ancestry.


This graph represents the stacked bar-plot of an unsupervised ADMIXTURE exercise which is aimed at studying the complex and intricate ancestral components of Puerto Ricans from Puerto Rico, based on samples that were collected from the 1000 Genomes panel.

The choice of populations that are represented in these ADMIXTURE graphs was firstly made to account for the major, historically known contributors to the Puerto Rican population: Iberians, indigenous Caribbeans, and former African slaves who are, respectively, represented by the "Maya" and "Yoruba" samples. 

Secondly, the presence of the merged North African samples in the dataset of these ADMIXTURE graphs serves as a formal test of comparison with the Iberian population in order to verify the aforementioned hypothesis.

The graph for K=4 clearly shows the "light-blue" component, represented in the Puerto Rican (PUR) samples, in addition to their Iberian (red), “Maya-like” (green), and “Yoruba-like” (purple) contributions. 

The "light-blue" component is largely restricted to the North African population and also mostly found in the Saharawi samples, making it a "Saharawi-like" component. In other words, it is the identifiable North African component of this ADMIXTURE exercise. 

This finding contrasts with the typically much lower North African scores of Hispanic Caribbeans that are reported in commercial autosomal DNA tests. I suspect that the use of Mozabite samples as proxies for North African may conflate their Berber ancestral component with the Iberian ancestral side of their complex genetic makeup.

I included Canarian samples because they still display a minor distinct variation of North African admixture relative to Iberians, although it is important to keep in mind that individuals from those samples, as well as present-day Canarians, are more similar to Iberians from an autosomal genetic standpoint. Moreover, studies that were done on Canarian autosomal DNA have shown disparities in the amount of Guanche (Berber) admixture among individuals who are located in different Islands of the archipelago. Canarians from La Gomera seem to have retained the most Guanche ancestry.

Maju had a blog post about a paper on the estimates of Guanche or Berber genetic influence of Canarians here:

Hypotheses made in the recent past about a possible genetic link between Canarian aboriginal Guanches and Puerto Ricans, on the basis of the unknown role that the Canary Islands have played in the colonization of the Americas, are supported in these unsupervised ADMIXTURE runs. Hypothetically, they could have similar implications for some Admixed Latinos and specific Caribbean communities, but most notably for Hispanic Caribbeans.

Reasons for investigating this issue

I am a person of Fula descent. I wasn't predisposed to experiment on this issue, in the sense that I have a different ethnic history. With the help of the software ADMIXTURE, I decided to use my autosomal data and compare it with publicly available datasets, which include populations that are compatible with my genetic history. In addition to my Fula-specific and West African ancestral components, which were similarly detected in the populations studied by Henn et al. in 2012, I also scored a North African percentage.

I had noticed that my data matched up considerably with "New World" Afro-descendants but also, very intriguingly, with a large number of Hispanic Caribbeans.

At first, I attributed it to the fact that West Africa was a region from which slaves were sent to the Americas.

However, when I tried to identify what specific ancestral components I share with some of those Hispanic Caribbean matches, a common restricted Northwest African ancestry seemed to emerge as a pattern with several of them. After reading the blog-post of Maju on Caribbean autosomal ancestry - several years after he posted it - and the possible Northwest African hypothesis of Hispanic Caribbeans, I figured I would try to verify it and maybe, at the same time, manage to elucidate some of my questions.


[Note (update Dec-31-2020): the sharing of this very drifted (PUR) component between the complex admixed Puerto Rican samples and my sample is difficult to interpret precisely, and from a historical standpoint, as the Lawson et al. paper makes very clear (please see the last update from 2 years ago). Comments below are clues for follow-up research.]
The graph for K=5 indicates a green specific homogenization of the Puerto Rican samples (PUR) in comparison to the other populations, which suggests a recent founder effect that most likely took place over the past few centuries. 

Very intriguingly, my North African component is replaced by this PUR specific component instead of the yellow North Africa. It suggests that the "Guanche-Berber" side of the Puerto Ricans overlaps with my Northwest African component. 

I would say that it indicates some complex genetic links between the Guanches and, possibly, other Northwest African populations.

I hope that these unsupervised ADMIXTURE exercises can be of help to those interested in the autosomal genetic links between Hispanic Caribbeans and Canarian aboriginal Guanches.



I used publicly available datasets to perform these ADMIXTURE exercises.

The first one contains a combined dataset of populations from both the 1000 genome project and HGDP unrelated samples, for a total of 162,645 SNPs. It has been filtered and re-arranged by its contributors who are Peter Carbonetto and Amir Kermany.

It belongs to the Ancestry DNA workshop on Github.com.

All the repositories can be accessed here: https://github.com/Ancestry

It was publicly available until a year ago and was utilized during the Computational, Evolutionary and Human Genomics (CEHG) Symposium.

The PUR (Puerto Ricans in Puerto Rico), IBS (Iberians from Iberia), the Maya and The Yoruba samples were selected from this dataset.

The second dataset is from the Henn et al. study from 2012, “Genomic Ancestry of North Africans Supports Back-to-Africa Migrations.” It contains the North African samples that I used for the exercises. I merged them with the dataset that contains the PUR samples, and intersected 44,804 SNPs.

This is the link to access it: http://biologiaevolutiva.org/dcomas/north-african-affy-6-0-data-henn-et-al-submitted/

The third dataset is from the Botigué et al. study from 2013, “Gene flow from North Africa contributes to differential human genetic diversity in Southern Europe.”

It has Spain_S (Andalucians), Spain_NW (Galacians), and Canary Islanders. I also intersected 44,804 SNPs with the first and the main datasets.

The link to access it is here: http://biologiaevolutiva.org/dcomas/north-african-affy-6-0-data-henn-et-al-submitted/

I used the software PLINK to update the physical and genetic positions of the SNPs from the second and third dataset, in order to properly merge them with the ones from the first dataset. I also made sure to merge only SNPs that were already found in the selected dataset (1000 genome and HGDP).

Lastly, I intersected my personal data with the dataset (1000 genome and HGDP), for a total of 161,764 SNPs.

The software ADMIXTURE was used to estimate ancestry.

R was used to plot the estimates.

Update (Oct 30th): 

I would like to briefly elaborate on the sampling strategy. The first ADMIXTURE runs that I produced contained additional continental European populations, as well as other West Asian samples. The display showing the distinctive ADMIXTURE coded colors between North African and European samples of the dataset appeared at higher K values, with their respective higher standard errors of the cross-validation error estimate.

I had asked for Maju’s insight on admixture analyses in the past, as I was interested in how his posts on West African and Berber genetics related to my personal autosomal DNA. I did the same for this analysis.

I followed Maju’s recommendations to limit the selection of the reference population to be analyzed to just 4: Iberians, West Africans, Mayans, and Northwest Africans. This resulted in the clear and distinctive display of Berber and Iberian components, starting at K=4 which has a lower standard error. I later added Canary Islander samples.

Note: I have also been asked to replace Yoruba with Senegambian Mandinka samples to check for potential differences. This is something that I had already checked, but I didn't notice any difference in either the Berber percentage in Puerto Ricans or in their homogenization, which indicated a recent founder effect.


Update March 14th 2019:

After this article was posted last October, I received a lot of interesting feedback on the admixture analyses and suggestions for different ancestral contributions of Hispanic Caribbeans, both in private messages and in the comment section of this post/both publicly and privately. In light of this, I would like to go over some aspects of the analysis again. 

A note of caution in the interpretation of estimates

The estimates of the clusters from ADMIXTURE are not to be interpreted literally. The different ancestral k components are not “real” populations. They are designed to help identify differentiation between populations.

Both supervised and unsupervised analyses will produce FSTs between the designated populations or between the clusters. They serve to evaluate “approximately” possible genetic variations. In this type of analysis, as we can observe in the graphs contained in this post, moderate amounts of the components that are less divergent from each other overlap across populations which share lower FSTs. Considering that FSTs between North Africans and West Eurasians is usually around 0.06, there will inevitably be a shared overlapping effect. As a result, it isn’t possible to obtain a very precise delineation between North African and Iberian samples. So, essentially, this is an evaluation of variation and not an accurate system of measurement. 

Intuitively, it seems that analyses which contain populations or clusters that are separated with higher FSTs will be more robust. It also seems that when FSTs fall below 0.05, the degree of differentiation in the displayed clusters is difficult to evaluate or make sense of. This may explain why analyses of intra-European/Mediterranean populations with FSTs that are around 0.01 are difficult to evaluate with ADMIXTURE. Other steps can be taken to mitigate the effects of linkage disequilibrium, as was the case for the dataset that was used for the analysis in this post.

ADMIXTURE works better for recently admixed groups who derive their ancestry from distinct populations. For obvious historical reasons, African Americans and Hispanic Americans have recent ancestries that most admixture analyses can detect fairly well.
Evidently, the total complexity and chaotic processes of ancient migrations which are not static, but rather dynamic cannot realistically be captured by ADMIXTURE. The complete reconstruction of such patterns on the basis of present-day populations would obviously be misleading.

Daniel J. Lawson, Lucy van Dorp and Daniel Falush wrote a paper called, “A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots” (2018) in which they warned against some of the pitfalls of admixture analyses.

While it’s not possible to make exact predictions from tools that are used in the field of population genetics, when interpreted correctly, some interesting information can still be extracted from various analyses.
Previous research on the possible Canarian legacy in America, including the examination of historical records, had been conducted prior to the publication of the Moreno et al. (2013) paper. With regard to the genetic affinity of the aboriginal inhabitants of the Canary Islands, a similar analysis was done more recently on the autosomal DNA of ancient Guanche samples that may have lived between the 7th and the 11th century and is discussed in a paper by Rodrı́guez-Varela et al. (2017). The authors conclude that a Northwest African-specific ancestry component makes up the majority of their autosomal ancestry, as well as other Berber populations from North Africa. Additionally, Y-DNA E1b-M81, which is found at high frequencies in Northwest Africa, was also detected in these samples.

In the study from Arauna et al. (2016) on how “Recent Historical Migrations Have Shaped the Gene Pool of Arabs and Berbers in North Africa,” the authors expressed doubts about the use of Mozabite samples as the sole proxy for North African genetic diversity. 

Considering that the paper from Moreno et al. didn’t have North African samples, the focus of this post was to explore potential variations by including Northwest African samples such as Moroccans and Saharawis.

Naturally, to exactly what extent inhabitants of the Canary Islands – whose gene pool could have already been affected by the DNA of Iberian settlers - may have impacted the genetic pool of Hispanic Caribbeans is a question which would require further and more diversified analyses.

mtDNA L(xM,N)

Several studies have reported mtDNA L(xM,N) among various Latin American communities. They strongly suggest recent African ancestry in the context of the recent colonization of the New World. The uncommon L(xM,N) lineages that have formed variant specific subclades which are not native to Africa but rather found in other continents or regions are extremely rare, it seems.

In 2012 Cerezo et al. published a paper on subject which is titled, “Reconstructing ancient mitochondrial DNA links between Africa and Europe.”

Another study, published by Ricardo Rodriguez-Varela and his colleagues, is called “Genomic Analyses of Pre-European Conquest Human Remains from the Canary Islands Reveal Close Affinity to Modern North Africans.”

More recently, a paper called “Mitogenomes illuminate the origin and migration patterns of the indigenous people of the Canary Islands” was published by Rosa Fregel with the mtDNA sequencing of 48 ancient individuals. Out of all of the L(xM,N) lineages that were analyzed, only the newly defined L3b1a12 was identified as a new Canarian-specific lineage.

It appears that European and Canarian autochthonous mtDNAL(xM,N) lineages form subclades which correspond to specific mutations that are less likely to be found in Africa.

In the case of Puerto Ricans, there was a project from National Geographic called “Genographic Project DNA Results Reveal Details of Puerto Rican History” (2014). After sampling 326 individuals from southeastern Puerto Rico and Vieques, they found that 80% of Puerto Rican men carry West Eurasian (or European) Y-DNA paternal lineages, while 60% of Puerto Ricans carry maternal lineages of Native American origin. This may shed some more light on the findings of Moreno et al., (2013), who wrote of the “Latin-European” component which seemed to indicate a founder effect.

In contrast, it would be interesting for future research to sample Hispanic Caribbean communities where African ancestry may have been retained in higher proportions and, in the process, collect more mtDNA and Y-DNA.



  1. Holy crap, looks like you may be on to something! I always assumed PRs had a little black, but after reading further it does appear they have quite a bit recent Canary from immigration. Makes sense.

    This is newsworthy

    1. Thanks! Yes indeed, the black color of the graphs from ADMIXTURE in Moreno 2013 didn't suggest a Tropical African source.
      It was the actual "green" component of the study, a Yoruba-like component, to be precise, which indicated this source.

    2. This is great work, I am Dominican and I have many fulani relatives some sharing as high as 16 cm. All on two segments, in my triangulations i can tell one of the segments is purely north african, noy surprisng sinces guinean fulanis ive seen have upward to 20% fulani. I have seen my fulani relatives matches on gedmatch and there arr quite a few puerto ricans. So my only critique is this blacl componennt can be a trifecta of senegambian berber via west african admix, guanche canarian admix and moorish admix in the canaries and spain.

      In the new 23andme update they included fulani samples and some of my matches north african became senegambian while others didnt i think the truth lies in those who is conclusively not north african via fulani

    3. Hello,

      Thank you for your comment and your interest in this post. You bring many interesting issues that I would like to elaborate on and clarify. It could also be informative to other users, who are interested in them, as well.
      Given the widespread practice of Slavery in the Americas and the origins of the enslaved West African people who were displaced, it is normal that your data match with Fulani who tested, and more generally, West Africans.

      Before I start addressing your findings, I would like to briefly go over what the scientific literatures, as well as other corroborating independent analyses, which have been done by Maju and me included, have suggested on the autosomal DNA of Fulani.

      In 2012, a genetic study was published (Henn et al. 2012):

      Here is a link https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002397

      or see the link of the post on this study that was made by Maju on this blog:


      They have highlighted the different ancestral categories of North African and other tropical African populations by including the Yoruba, the Bulala, the Fulani, the Maasai and Luhya. It covered a wide range of North African populations from the Western Sahara to Egypt. Of course European Basques and Middle Eastern Qatari were included for "control" purposes.

      The software ADMIXTURE was used to perform several exercises in order to identify and separate different clusters.

      The method was unsupervised, meaning that the researchers didn't arbitrarily select clusters, they had employed a good sampling strategy where all the necessary proxies had been included, so they let the clusters form themselves up to k=10.

      You will notice in the study that North Africans have mostly the light blue component, which is the Berber component, except for the Egyptians and Libyans who the purple as their main source of ancestry.

      At k=8, it is clear that the peculiarity of the Fulani; in addition to their West African ancestry, is what looks like an ancestral component similar to the ones of Berber ancestry.

      At k=10 (see section “Supporting Information”: Figure_S1.tif) , the Fulanis form a "Fula" specific component restricted to them only, indicating, an old trans-Saharan admixture event, which after enough generations of endogamous patterns led to a distinctive genetic population in tropical Africa.

      Intriguingly, Maju had pretty much ran similar exercises which corroborated the findings of Henn, just weeks earlier.

      See here: http://forwhattheywereweare.blogspot.com/2011/12/north-african-genetics-through-prism-of.html

      At k=8, he had noticed the same “Fula-specific” of the Henn’s study.
      His description:
      “A Fulani-specific component shows up. Intriguingly it is almost equidistant by Fst measure from the Mandenka and the Sahrawi components (0.105 and 0.115 respectively). All the North African specific components are much closer to West Eurasian ones than to the Mandenka component, so this might suggest a very old kind of trans-Saharan admixture, then homogenized in a single component.”

    4. ...

      I also ran the same exercise with the same populations, using ADMIXTURE, and had the same findings.

      I used my raw data (about 690 k of snp); I converted it to binary, bed files. I was then able to find the dataset with the same populations found in the study.
      I properly merged my files with the available data set using PLINK and its manual.
      I used the ADMIXTURE and R software (all free and available to the general public) and ran the exercise, unsupervised, several times.
      At k=5, I had all the clusters found in the Henn's study for k=10.
      My "Fula" specific component is almost 40%.
      I have the dark blue component which is predominantly found and restricted to North African groups for 16-18%.
      And finally I score for about 42-44% West African (most likely Senegambian, given other runs, I have made) ancestry, in this case similar to the Yoruba group, serving as control for "tropical Africans".

      None of the Fulani samples from the dataset, scored or displayed the dark blue/North African component expect me, after the formation of the “Fula-specific” component.

      It’s a peculiarity that applies to me and my private genetic ancestry. It should not be confused with the pattern of the genetic makeup of Fulani (composed of West African and North African ancestry), observed in all the studies and experimentations that I have just mentioned, which seem to indicate some old and complex trans-Saharan admixture events.

      It’s precisely the reason why I was intrigued to investigate this side of my genetic ancestry, following what I had already identified as shared North African ancestral components without any additional tropical African category, with other Latino and Hispanic Caribbean, independently from my known Fula and West African ancestry.

      There are Fula communities in almost every West African country, across the Sahel, and all the way to Saudi Arabia, who have been there for several generations in many cases and for some of them, as their autosomal DNA seem to suggest, reported cases of inter-mixing with local populations.

      Anecdotal and unexpected ancestral contributions from other populations of the region or up North in more recent times, as it is the case for me, can also be manifested and should be understood in the proper context of local historical developments.

      I didn’t remove my sample because it’s possible that other people who are West Africans, not just Fulani, may have similar unexpected findings with their autosomal DNA, Y-DNA, or MTDNA, which could suggest an external source of ancestry due the regional or nearby historical developments.

      My data mostly match with Afro-American, West Indian; and occasionally Hispanic Caribbean or Latino. When I used the chromosome browser tool from gedmatch and managed to pinpoint very precisely what segment I shared with many of them, it was a West African component with occasionally also additional traces of West Asian or North African ancestry.

      All indicative of shared Fula or Senegambian ancestry.
      I had also identified shared North African ancestral components without any additional tropical African category, with other Latino and Hispanic Caribbean. This is the side of my genetic ancestry; I investigated when I focused on my North African, non-Fula specific component which I stated in this post.

      My best guess for this restricted Northwest African/Berber link with Latino and Hispanics, more generally, would be oriented toward the presence of Moroccan (of North African origin) in the West African region. So hypothetically, by way of the remnants of the documented presence of the Moroccan empire in the region, or some Tuareg blood from the Sahara.
      See here:




    5. ...

      I would be extremely cautious of the reported values or date estimates that are produced by commercial DNA tests. They don't seem to be reliable.

      You may have heard of IBS (identical by State) vs IBD (identical by descent). 16 cm, even according to those companies, is a low value to consider being relevant for genealogical timeframes, and would indicate either a very distant relative or a false positive.

      Date estimates, at this point, aren’t very reliable. Maju has repeatedly questioned, on this blog the method of determining the molecular clock which seems to be inappropriate and from I remember, not adequately determined in phase with what other more robust findings in other scientific fields seem to suggest: a much older time estimate for the OoA (Out of Africa) theory. I don’t know the methodology that is used by commercial DNA companies, but I would assume that they are using the wrong estimates which in any case are likely too recent in time to be accurate.

      It is also important to keep in mind that your MRCA (most recent common ancestor) that you share with any present-day match who may belong to a particular ethnic group, may have possibly belonged to a different ethnic group or population. Fulani have traditionally strongly practiced polygamy and frequently inter-married with people who belonged to other ethnic groups.

      DNA tests are also more popular among people of Fula descent and more generally ethnic groups who are located at the periphery of some geographic areas in Africa, who may have heard various “enigmatic” hypotheses about their origins, which results in more interest and curiosity than other West Africans.
      It’s possible that Fulani, just like Somali or Ethiopian in East African with different ancestries of course are overrepresented as a customer base, compared to other African groups

      In the cases you would have Fula matches, and have access to the tool called chromosome browser which allows you to identify the segments, you may share with them, you cannot separate the shared segments which appear to be solely so-called North African from the tropical African ones. It doesn’t matter if they are different segments (tropical Africans and Northwest Africans), as long as they come from the same 1 individual. It would evidently suggest some possible Fula ancestry which I said seems to be quite old and most likely involved some trans-Saharan admixture. Different from regular present-day North African ancestry.


    6. ...

      In the paper of D’Atanasio 2018, on: “The peopling of the last Green Sahara revealed by high-coverage resequencing of trans-Saharan patrilineages”


      Interestingly, haplogroup E-M2_Z15939 is present at high frequencies among different Fulbe groups.

      But also found in other groups:

      5.56% for the Asni Berbers
      50% for the Fulbe of Nigeria (North)
      25% for the Mandenka of Senegal
      36, 36% for Gambian in Western Division-Mandenka
      14.29% for Mende in Sierra Leone
      3.33% for Burkinabese
      22.73% for Tuaregs from Niger
      15.19% for Fulbe from Cameroon (North/Chad)
      2.40% for Dominican from Santo Domingo and 3.03 for Mexican in Los Angeles

      The authors wrote:" Interestingly, the geographic distribution of this clade (Fig. 3b) perfectly traces the Fulbe migration from western Africa, where this haplogroup is also common in other ethnic groups, to central Sahel, where the same haplogroup is only found among Fulbe populations."

      It may be one of the founding lineages for Fulbe and/or indicative of recent Senegambian paternal origin.

      “So my only critique is this blacl componennt can be a trifecta of senegambian berber via west african admix, guanche canarian admix and moorish admix in the canaries and spain.”

      Given previous ADMIXTURE exercises of the same study from Moreno 2013 and what we know now, it is highly unlikely.

      If you look at the ADMIXTURE runs that precede the formation of the so-called “black” component, in the study from Moreno 2013, at k=3, you will notice that Cubans, Dominicans, Puerto Ricans, Colombians, Hondurans, and Mexicans displayed overwhelmingly by far, the “red”/European/West Eurasian component with some low to moderate tropical African/”green” component as well as some minor “blue”/Native Central American ancestry.

      Same pattern was observed in the experiment of this post, at k=4, the Puerto Rican samples are mostly composed of the North African/blue component and the red/Iberian component with minor tropical African and Native American.

      North African genetic has much greater affinity with West Eurasia than it has with Africa, expect for the presence of Y-DNA E1b, of course.


    7. ...

      Senegambian should not be described as “Berber”. I would respectfully encourage you and others, who may be interested in these issues, to become more familiar with the indigenous populations of West Africa, the local ethnic groups of West Africa, in their diversity.

      Berbers are indigenous to Northwest Africa or what some people in other languages can sometimes refer to as the “Maghreb”. They are not West Africans.


      In the last posted update, I have indicated that I had actually replaced the Yoruba samples with the Mandinka ones (from Senegal) and didn’t notice any difference in the output.
      Regardless of it, the use of Yoruba shouldn’t be interpreted literally. It acts as a proxy for tropical West African ancestry here.

      By “Moorish” I am not sure what specifically is being referenced here. If I am not mistaken, the so-called Moors of Iberia were Muslims, of North African ancestry from Northwest Africa.

      Maju is more familiar than me on this issue, and has blogged extensively on Iberian genetic on this blog.
      I don’t know to what degree the presence of the Moors may have influenced the genetic pool of Iberia.

      There is also an interesting blog called: “Tracing African Roots” which focuses on West African ancestry and helps interpreting every aspect of commercial DNA test, reasonably.
      It explains and deals with their shortcomings, while still managing to get valuable information for Afro-descendants.


      I would highly recommend it, even though, it deals with Ancestry DNA, but I think that Felipe’s recommendations and observations are still applicable to other companies.

      He has reported extensively on Upper Guinean ethnic scores for Cape Verdeans, Dominicans and Puerto Ricans in his blog. He made various hypotheses which attempt to explore possible ancestral paths for some groups with documented with links.

    8. All I can add is that "Moor" or "Moorish" derives from the ethnic Berber (Amazigh) name of Mauri, who lived in North Morocco and West Algeria in the Phoenician/Roman era and had a kingdom (later Roman province) called Mauretania. In the Medieval context of Spain that name became generic for Muslims (also those of Iberian or Slavic origin) and is still used to this day colloquially to refer to NW Africans particularly Moroccans (with some caution because it can be perceived as racist or xenophobic depending on tone and context). It's an endonym in any case (a self-given name of deep historical basis) and I know it was also used in West Africa as arabized NW Africans expanded over there in various waves (and thus the name Mauritania) and in some ethnic maps of Mali you can see the label "Moor" scattered here and there referring to groups that arrived or were assimilated with the Moroccan conquest of Shonghai in the 16th century.

      So I would say that Moor or Moorish is correct when applied to NW Africans, notably if we want to blur the line betweeen those who consider themselves Amazigh (Berber) and those who consider themselves Arab. It's also loosely correct when applied to the Muslim state of Al Andalus and particularly when it was incorporated to the NW African realms of Almoravids and Almohads in the late phase of the so-called Reconquista. However most of the time the Cordoba Realm or Al Andalus (first dependent Emirate, later independent Emirate and finally Caliphate as the ruling Ummayads, extremely Basque-ized by maternal ancestry, shought to restablish their claim to the overall Sunni Caliphate, with no practical effects) was independent from NW Africa and had for sure native Iberian islamized elites at least in part. We see that in the dynasties that emerged in the Taifa period, when the Caliphate broke apart into its provinces: most dynasties were Andalusi (i.e. Iberian Muslim), although a few were Berber (Málaga and Almería notably) and others were "Saqaliba" (i.e. "Slavic", Mamluk-like, these in the East: Valencia, Denia).

    9. Erratum: "Songhai", not "Shonghai".

      More on Moors: https://en.wikipedia.org/wiki/Moors

    10. Thierno, I was the one who questioned above. Great detailed explanation!

      I am seeing this fulani-specific component in the NEW 23andme being clustered under s enegambian. FOr example. my grandpa with all the Fulani matches , overlaps with 5 Mexicans, and these are large segments i'm talking aobut 15-21 cm.

      But basically, the "north-african" segmetn as it shows on my grandpa on CHR 12, from position 11 to 21. Shows as "senegambian and guinean" for his mexican match. So it seems in my case the north-african I share with fulanis is fulani specific.

      I would also add, that you say your north-african component did not get swalloed by the fulani-specific component, may there be multiple fulani-specific north-african components? and perhaps your clan has a different one from the folks tested? THe fulanis in most datasets are from Burkina Faso to my understanding.

      Also i'd love to chat on gchat, or any messenger about how you ran ADMIXTURE and PLINK to do this, I did this many years ago, and need a refresher.

    11. I did a blog post some time ago on my fulani matches.


    12. “I am seeing this fulani-specific component in the NEW 23andme being clustered under s enegambian. FOr example. my grandpa with all the Fulani matches , overlaps with 5 Mexicans, and these are large segments i'm talking aobut 15-21 cm. “

      Your description of “fulani-specific” component in the context of the so-called “Senegambian” category from 23andme, is very incorrect.


      The category called “Senegambian & Guinean”, according to their own description, groups people from Senegal, Gambia, Guinea-Bissau and Guinea. Needless to say, that it has many different ethnic groups that are found all across the region. In addition, the method used by commercial DNA companies produces imprecise results.

      This is what I wrote to Fonte on this issue: “the tested data, as well as the preselected samples, are isolated in categories ( such as Senegambian from 23andme) that have been "arbitrarily" selected. They don't let each individual sample form itself. So all the existing individual variations of the dataset are ignored. This element, I suspect, increases errors, and produces more bias issues.”

      The Fulani-specific component which was identified in the Henn’s study at k=10, by Maju, as well as in my work, is produced by using the unsupervised mode of ADMIXTURE, where individual members of specific “ethnic categories” are distinctively identified.
      It’s completely different from the approach that consists of selecting people on the basis of recently established borders/countries or regions which aren’t reflecting the genetic or even the linguistic diversity. This is why, I suggested to you and others to become more familiar with the history of the region.
      Commercial DNA test can differentiate between continentally and regionally (in some cases) separated populations. With endogamous or ancient populations, the true ancestral components won’t be detected and identified.



    13. “THe fulanis in most datasets are from Burkina Faso to my understanding.”

      Please check the links that I shared with you. They are full of information. The Fula samples from the Henn’s study, from Maju’s exercises, and from my work are from Nigeria. Check the map of the study.
      I am not aware of any study on the autosomal DNA of Fulani, from Burkina Faso.

      I wouldn’t want to throw the baby out with the bathwater. I understand that for many Afro-descendants, as well as other communities, commercial DNA tests allow them to get information about the origin of their enslaved ancestors and/or immigrant ancestors.
      However, I think that excessive wishful thinking without credible evidence, may sometimes get in the way of improving our knowledge.

      In the conversations, I had initially with Maju, on the genetic origin of Fulani people, I shared with him narratives that are commonly found among different Fulani communities, that I've always heard without ever having the opportunity to verify them.
      When he showed me scientific evidence (genetic studies, his exercises), which oriented and pointed to older and different sources of likely origins, and in the process kept mentioning ADMIXTURE, I was interested to learn on my own, how to use it, and verify my private autosomal DNA more accurately.

      It led me to compare my data with other Fulani samples. My understanding of the complex genetic origin of Fulani people has improved, as a result.

      In the field of genetic, the benefit of science, is also to have reliable standards for acknowledging some hypotheses or claims. Of course, new studies or new findings can always help modify, refine current knowledge or orient in different directions.

      After ADMIXTURE, I experimented with many types of software that are used in the recent studies on autosomal DNA by current geneticists. Naturally, each of them has its pros and cons, and ultimately its own limitations, depending on a myriad of factors and what’s expected from the user. For present-day modern populations, with those peculiarities, ADMIXTURE works fine, if “the sampling strategy” makes sense.
      For ancient populations, qpAdm, f4/D stats might be better adapted than ADMIXTURE, just to give an example. Checking for standard errors and adjusting to it, is also the key.


    14. “But basically, the "north-african" segmetn as it shows on my grandpa on CHR 12, from position 11 to 21. Shows as "senegambian and guinean" for his mexican match. So it seems in my case the north-african I share with fulanis is fulani specific.”

      On gedmatch, what I referred to is in the Admixture utilities section, where it says “Paint differences between 2 kits, 1 chromosome”. Using Dodecad, for example, which has the North African category that’s created not just based on Mozabite samples, but also on Moroccan ones. The chromosome painting tool, is precisely what I used. I didn’t identify this restricted and shared common North African component, by just relying on chromosome positions. It doesn’t inform on the type of admixture that’s shared with other matches.
      Based on the information that you provided, you have no evidence to claim a fulani-specific component. If you don’t properly compare your data using the unsupervised mode of ADMIXTURE with the right populations, there is no way to know.

      “I would also add, that you say your north-african component did not get swalloed by the fulani-specific component, “

      It’s what the results from the experiments with my data suggest. That’s the idea behind those investigations. To make hypotheses, based on the outputs.


    15. “may there be multiple fulani-specific north-african components? and perhaps your clan has a different one from the folks tested? “

      It’s not impossible and it can’t be dismissed, considering that all existing Fulani communities haven’t been sampled, tested and compared to each other.
      However, based on the accumulated and most recent serious evidence on the subject, there is systematically a common old trans-Saharan component.
      If there are differences, I would say that it is mostly due to inter-mixing with local populations of the geographic area where they live.

      There are several studies other than the ones that I have already mentioned to you with links.

      This study, “Begona Dobon et al., The genetics of East African populations: a Nilo-Saharan component in the African genetic landscape. Nature – Scientific Reports 2015"

      Here is the link: http://www.nature.com/articles/srep09996

      Maju made a post on it some years ago.


      If you observe the Fulani samples from Sudan, at K=3, they shared ancestry with local Sudanese ethnic groups. At k=5, however, they display an additional unique specific component of the color pink. The authors have suggested:
      “A population that shows signals of recent admixture is the Fulani. Fulani are nomadic pastoralists who speak a Niger-Kordofanian (Niger-Congo) language and occupy a large area in Africa’s Sahel. Their origin is still controversial, as mitochondrial DNA indicates a West African and traces of North African origin23, whereas Y-chromosome studies showed shared ancestry with Afro-Asiatic and Nilo-Saharan Sudanese populations8. This shared ancestry with East African populations can be seen in Fig. 3 (k = 3), suggesting that they have admixed with local populations.”

      Secondly, the paper Gurdasani et al.2014 showed in figure 7 b. an exercise which included Fula from Gambia, dating an eurasian admixture around 100 generations ago and a second one btw 320-780 ago.

      Lastly, the paper Busby et al., 2015, Figure 6-figure supplement 1 shows that the penetration of IBS and TSI gene flow in Africa before 0CE was via Northeast Africa.
      The authors stated that that they had identified some southern European ancestry in the Fulani which suggested that it might have entered West Africa via North East Africa.

      This is a passage from the paper: “The Fulani, a nomadic pastoralist group found across West Africa, were sampled in The Gambia, at the very western edge of their current range, and have previously reported genetic affinities with Niger-Congo speaking, Sudanic, Saharan, and Eurasian populations [Tishkoff et al., 2009; Henn et al., 2012], consistent with the results of our mixture model analysis (Figure 4A). Admixture in the Fulani differs from other populations from this region, with sources containing greater amounts of Eurasian and Afroasiatic ancestry, but appears to have occurred during roughly the same period (c. 0CE; Figure 5).”

      Check this link here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915815/

      or this one https://elifesciences.org/articles/15266

      On this blog, Maju made a quick post on it.


      This is likely the same source of admixture. So already, there are Fulani samples from Nigeria, Sudan, Gambia and my anecdotal data for what it’s worth, since my grand parents were from the Fouta Djallon region of present-day Guinea.



    16. The Y-DNA study from the paper of D’Atanasio 2018, suggests a strong correlation between haplogroup E-M2_Z15939 and Fulani communities from various areas in the Sahel.

      My closest matches on the database of commercial DNA, are overwhelmingly Fulani people from various countries. Senegal, Nigeria, Sierra Leone, Mauritania and Mali. All the way to Saudi Arabia.
      I had the opportunity to write emails with some of them. Typically they date the presence of their families (in most cases the Adamawa clan) in areas such as Northern Nigeria, or the Red Sea area, no earlier than 2 centuries ago. And it would correspond to the historical events which occurred in West Africa, 2 centuries earlier.


      Considering the timescale in the field of genetic, that’s almost no time. It’s very recent. I just don’t see how in 2 centuries, various Fulani communities which abandoned a long time ago their nomadic lifestyle and inter-mixed with other local ethnic groups could have formed other so-called specific components.

      When I used alder and malder, I progressively modified the dataset and ended up eliminating sources for which I have no admixture with. Based on my admixture results, I only selected the following populations: Mandenka;Yoruba;Gambian;Esan;Mende;Moroccan;Saharawi;Tunisian;Algerian;Mozabite

      The only successful exercise with a 2-ref weighted LD significant curve was obtained with the Yoruba and the Moroccan samples. It failed for the other samples.
      It estimated 18.76 +/- 9.07 generations for the signal.
      Even using Maju’s correction of time estimate, by 1.5 or 2, it falls deep into the period of the Berber dynasties and presence in the region.


      “ how you ran ADMIXTURE and PLINK to do this, I did this many years ago, and need a refresher.”

      I downloaded my data ( around 690k of snp) from the DNA company. I converted it to the PLINK format.
      Various website have scripts and tools which you can download and use.


      I installed Ubuntu. It makes thing easier.
      I found the data from the published studies. Check the links in the appendix section.
      Also check: https://reich.hms.harvard.edu/datasets
      It has datasets from Lazaridis.

      You would have to become familiar with PLINK and its manual just like with the ones from ADMIXTURE and R.
      The genetic and physical positions of your files and the data you want to merge have to match, depending on the type of genomic assembly.
      The type of SNPs from the data you want to merge, has to correspond to the ones of the dataset you are interested in.
      Ideally, you want to perform ADMIXTURE’s cross-validation procedure because it will help to know which K value has the highest CV errors.


      I won’t lie to you, it’s a hassle and time-consuming lol. Running it using public datasets is not as bad, but merging private commercial data to them can be challenging, especially if you are trying to keep a decent number of SNPs.

      “Sampling strategy” is tricky.
      It doesn’t seem like it at first, but it’s perhaps the most challenging part, in my opinion.
      You have to be aware of previous studies and what has been established already for the populations you are interested in.

    17. Thanks again for replying so thoroughly, Thierno.

      I must say that I'm very pleased that those "quickies" I had already forgotten about have been of some use to you. It's always a moral gratification when you know that your little contributions have some value for others.


    18. Sure. You are welcome.

      My sample is in the exercises of the post, so issues in relation with the genetic origins of Fula people eventually needed to be addressed with clarifications.

      In the French speaking world, I realized that many people haven’t been exposed to this information. Eventually, one day maybe, I would like to share all this information somewhere, this time in French.

      Except for the presence of Fula communities in other non-West African countries and in some cases, some inter-mixing with other communities, depending on private anecdotal examples, unfortunately, many fictitious stories have been perpetuated on the so-called origins of the Fulbe people. Many historical accounts of so-called origins or migrations aren’t always true.

    19. The findings of this new paper called “Echoes from the last Green Sahara: whole genome analysis of Fulani, a key population to unveil the genetic evolutionary history of Africa,” contradict the hypothesis that the NW African component of Fulani is solely due to recent admixture between western African and northern African groups.

      The so-called Muslim expansion south of the Sahara in historical times as the only explanation for the origin of Fulani people is not credible, especially considering these admixture, qpWave, qpAdm and older Y-DNA analyses. They reveal much more complex and ancient phenomena which may have been more widespread across the Sahara and Sahel than previously anticipated. The advantage of their analyses is that they used more than 1800 samples, including ancient DNA samples such as Iberomaurusians, from the relevant geographic regions and continents. This contrasts with other DNA studies about these populations, which were more limited both in scope and in sequencing depth & coverage.

      Some of the findings in this paper confirm what I had long suspected: that there could be some deeper & older underlying phenomena associated with the genetic origins of Fula people and similar groups across the Sahel that have been largely unexplored up to this point. This is in part why, in relation to the Puerto-Rican/Guanche connection with my data, I had limited myself by simply implying that it could suggest some complex genetic links with other Northwest African populations and didn’t mention recent historic Muslim expansion south of the Sahara as a hypothesis for disentangling the complex origins of Fulani people. At best, I considered it to be just anecdotal depending on each person’s ancestry.

  2. A question remains open: does this apply to all the Spanish Caribbean, Mexico and Colombia included, as suggested by the overwhelming "black" component in Moreno-Estrada 2013 or are there nuances? Only future more extensive research can clarify this.

    1. Could you check Iberomaurusian admixture in Puerto Ricans?

    2. It’s not clear what new insight could be gained by comparing Puerto Ricans (who seem to have gone through recent founder effect) with “Iberomaurusian” ancient samples. To identify what precisely?

      A limited analysis which would include Guanche samples, other selected Northwest African populations, and possibly ancient samples from Taforalt, as the main focus, could help us improve our understanding of the complex genetic history of the region.

      From a technical standpoint, whoever it might be, would have to find the samples, sort through them, and experiment with them, I guess. In any case, it would be a difficult task.

      There was a study: Loosdrecht et al. (2018). It included ancient DNA samples from Taforalt. They have been compared to present-day North African and Canarian samples, in connection with other ancient samples. Just FYI.

    3. This new paper, “Northwest African Neolithic initiated by migrants from Iberia and Levant” from Simões et al. (2023), shows that the Guanches from the Canary Islands are much more similar to Late and Middle Neolithic North Africans (KEB), than they are to Iberomaurusian/Taforalt, Epipaleolithic and Early Neolithic (KTG & IAM) North Africans. PCA analysis (Fig.1) also shows that the Guanches are shifted towards ancient Levantine populations. Interestingly, the Guanches’ ancestral components are also very similar to those of modern-day Northwest African samples that were used in this analysis: Saharawi and Mozabite populations from the Simons Genome Diversity Project (SGDP) dataset.

      All this information suggests that the ancestry of the Guanches, Saharawi people and other geographically similar Northwest African populations is unlikely to be different from other modern-day coastal Northwest Africans.


    4. 40 new ancient Canarian samples that are estimated to have lived from the 3rd to the 16th century were analyzed in this new paper from Serrano et al (2023), “The genomic history of the indigenous people of the Canary Islands.” As previously observed, the ancient Canarians are still very divergent from Paleolithic/Early Neolithic Moroccans, and cluster more closely with Late Neolithic Moroccans and present-day North African populations.

      However, all present-day Northwest Africans have less ancient North African, but a lot more West Asian & African ancestry compared to those ancient Canarian samples, and all of them are also very similar to one another (including Sahrawis). Sub-Saharan African ancestry was still detected in some of those ancient Canarian samples, suggesting the existence of trans-Saharan migrations in North Africa already before the Common Era. Moreover, those ancient Canarian samples seem to be divided in roughly 2 clusters. One located in the western islands that’s more closely related to Upper Paleolithic and Early Neolithic North Africans and the other in the eastern islands to European populations.


  3. "They concluded that the "black" component which was displayed on the ADMIXTURE graphs of the study most likely had a North African origin, by way of Canarian aboriginal Guanche ancestry."

    I'm not sure I understand. Is this suggesting that Puerto Ricans do not in fact have a black African component derived from African slaves brought to the island? That would be plainly implausible, (obviously black African ancestry in Puerto Rico exists, brought in the slave trade from west and central Africa) so I assume I am misunderstanding something. And as far as I know, Canarians have (and Guanches had) little black African/subsaharan admixture, so a Canarian origin for black ancestry in Puerto Ricans seem unlikely (again, I apologize if I am missing something, which it seems likely that I am).

    1. Hello Jm8,

      The color "black", in this context, is in reference to the display of ADMIXTURE analyses.
      Different colors are shown in order to distinguish between clusters.
      They should not be misunderstood for ethnic origins.

  4. "They concluded that the "black" component which was displayed on the ADMIXTURE graphs of the study most likely had a North African origin, by way of Canarian aboriginal Guanche ancestry."

    I'm not sure I understand. Is this suggesting that Puerto Ricans do not in fact have a black African component derived from African slaves brought to the island? That would be plainly implausible, (obviously black African ancestry in Puerto Rico exists, brought in the slave trade from west and central Africa) so I assume I am misunderstanding something. As far as I know Canarians have (and Guanches had) little black African/subsaharan admixture, so a Canarian origin for black ancestry in Puerto Ricans seem unlikely (again, I apologize if I am missing something, which it seems likely that I am).

    1. Hello Jm8,

      The color "black", in this context, is in reference to the display of ADMIXTURE analyses.
      Different colors are shown in order to distinguish between clusters.
      They should not be misunderstood for ethnic origins.

  5. This comment has been removed by the author.

    1. I see. Thank you for clarifying. It seems obvious now (and I had clearly misread the post/did not read it closely enough); all a rather embarrassing and silly mistake on my part.

    2. No problem: I can perfectly understand why the misundertsanding. You actually evidenced an issue in the clarity of the text, I would not have written it exactly that way but... not my article, so I added a clarification note in different color.

    3. Note comment above deleted because I apparently have a way too loose tongue and disclosed private info about the author he doesn't want shared. Sorry in case context is lost, comments can't be edited in Blogger.

    4. The term "black component" was used throughout the post on "Caribbean autosomal ancestry" in 2013.
      I simply used the exact term that was written in the post and in its comment section.

  6. This comment has been removed by the author.

  7. https://www.biorxiv.org/content/early/2018/10/22/448829.full.pdf




    Kaixo Maju

    This article is about the genetic history of Siberia, without too long to enter important topics.

    1-)the oldest Y DNA P1(ancestor of R and Q)was found in northeast Siberia.

    2-)ANE comes from another ancestor called Ancient North Siberian,and ANS Sunghir and Tianyuan hybrids.

    3-)the Amerindians come from an ancestor called Ancient paleosiberian,and AP is East Asian and ANE hybrid.

    4-)Baikal Neolithic(Kitoi culture Y DNA N and Q)completely East Asian,baikal bronze age was 50% AP.

    There are other interesting informations,good readings.

    1. P1 is not necessarily ancestral for R and Q, it can mean a parallel branch now rare or extinct. Archaeologically speaking, it should have migrated from West to East with the proto-Amerind (early Upper Paleolithic) migration eastwards, which is the only H. sapiens activity detected in Siberia before the LGM. The route of P > P1 > Q/R is clearly SE Asia > South Asia > West and Central Asia.

  8. A short update by Thierno has been added.

  9. Kaixo Maju

    Finally, we know the genetic structure of slab grave(deerstone)culture, one of the most important archaeological cultures of Mongolia.

    The Slab Grave Culture was considered proto Mongol or proto Turkic or some kind of "altaic" for a long time.However, the results show that the Slab Grave Culture was rooted in Paleo Amerind(yeniseic) like early bronze age mongolia and its closest relatives was okunev-karasuk culture in the altay-sayan region.

    Another important conclusion is that pastoralism in late Bronze Age Mongolia unlike the spread of Neolithic farming in Europe and the expansion of Bronze Age pastoralism on the Western steppe,it was adopted on the Eastern steppe by local hunter-gatherers through a process of cultural transmission and minimal genetic exchange with outside groups.

    (Note:I do not believe altaic language family or common altaic ancestry.)



    1. Ok. I don't understand why you "don't believe" in Altaic language family (which I understand as micro-Altaic: Turkic, Tungusic and Mongolic only), it's become quite mainstream (because the linguistic data seems to support it).

  10. No,on the contrary,Altaic language theory has largely lost its validity today, and this result has been achieved because the basic terms in the "micro-altaic" languages do not resemble each other.Only the some Turkic nationalists and the nostraticist defend this theory.

    (The mastermind of Altaic and nostratic language family is stratostin .You know, the guy who believes that Basque and Chinese are related.There are many erroneous information and etimology in his so-called Altaic dictionary.)

    1. Well, seems you could be right: https://en.wikipedia.org/wiki/Altaic_languages

    2. Anyhow the list of advocates and opponents and people in between is very much balanced. The issue seems unresolved.

  11. https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.researchgate.net/publication/327831316_THE_ADAPTATION_OF_THE_SEIMA-TURBINO_TRADITION_TO_THE_BRONZE_AGE_CULTURES_IN_THE_SOUTH_OF_THE_WEST_SIBERIAN_PLAIN&ved=2ahUKEwjRqLSP5KLfAhXvlYsKHcCdAg8QFjACegQIBBAB&usg=AOvVaw2lXKX9o_V34rtTR21xAYrA

    Gabon ba Maju

    This is the freshest seima-turbino paper.The Seima seima-turbino (krotov) metallurgy was a long-time controversial subject, whether it was a local production or an external production.Apparently it was a local production and developed from the previous Odinov culture.

  12. Intriguing demonstration of how unsupervised ADMIXTURE runs can contribute to our understanding of the significance of Canarian/Guanche genetic legacy among Puerto Ricans! And quite likely also for other Hispanic Caribbeans.

    "This finding contrasts with the typically much lower North African scores of Hispanic Caribbeans that are reported in commercial autosomal DNA tests. I suspect that the use of Mozabite samples as proxies for North African may conflate their Berber ancestral component with the Iberian ancestral side of their complex genetic makeup."

    Indeed, I find it striking how your choice of Moroccan & Sahrawi samples results in a much sharper delineation! Even though given both ancient & more recent genetic connections between the Iberian Peninsula and North Africa such overlap is not surprising in itself.

    Despite several shortcomings I do find it useful to compare commercial autosomal DNA test results across a wide and relevant selection of various nationalities. Firstly to establish where a certain ancestral component might be most or least prevalent. And secondly to contrast these findings with historical plausibility and/or known genealogy. In order to find out how much correlation might already be found, even if obviously imperfect ;-)

    I have performed such an exercise for various Afro-descended people, incl. Puerto Ricans as well as native Africans. Based on a survey of over a thousand AncestryDNA results. The focus being on the African breakdown incl. Ancestry's own version of a "Africa North" region.

    Like you rightfully point out the North African scores on commercial DNA testing platforms such as Ancestry appear to be underestimations. Given also how North Africans themselves rarely scored above 60 % "Africa North" (before the recent update of September 2018). However I still find it fascinating how this pronounced North African connection for Puerto Ricans you found actually also came to light during my survey of AncestryDNA results among the Afro-Diaspora. In my most recent summary of these findings (featuring 8 nationalities and 860 samples) the group average for 110 Puerto Ricans was 2.9% "Africa North" with a maximum individual score of 10%. And this level (again even if quite a likely an underestimation) turned out to be the highest when compared with the other nationalities, safe for Cape Verdeans (n=90) who showed the exact same group average of 2.9% "Africa North".

    See also this table:


    And this link for all the references:

    1. In general the problem with commercial autosomal DNA analyses is that it is too generic: they pre-select "artificial" model reference components (what I call "zombies", because they are not even actual populations but "distilled" components that are most concentrated in this or that area but often shared among many). So it may be an OK-ish preliminary approach but one that cannot replace properly designed unsupervised runs with an optimal "sampling strategy".

      In this case I "tutored" Thierno on how to design the ideal sampling strategy: we need of course some Native Americans and all the rest we need are samples from West Africa, NW Africa and SW Europe (Iberia). All the rest, say Swedes or Chinese or peninsular Arabs or Indians... are in excess because we know already Puerto Ricans don't have any meaningful inflow from those areas, they are not part of the question itself and can clutter the analysis in various ways, causing noise, etc.

      This is particularly true of "endogamic" populations (i.e. populations that have suffered severe bottlenecks, that clearly have low internal genetic diversity). For example many DNA testing companies use Ashkenazi Jews and people go literally nuts because they get some X% (usually small and meaningless) of that affinity, when the problem is that such endogamous popuplations tend to create their own artificial clusters or components (especially if oversampled) and attract others to their group in what is clearly an artifact, a bug and not a desirable feature of the algorithm. Some of that can happen with Mozabites and it definitely happens with the 1000Genomes Tunisian sample, which is from some endogamic population of Southern Tunisia and always gives weird results. It happens with Finns in European analyses, etc.

      Anyhow, finding this result was relatively hard. It is clearly there and is notable but it took several trials with different sampling strategies to have a clear result. The simplified straightforward sampling strategy I proposed: some NW Africans, some "controls" (Iberians, West Africans and Native Americans), worked best in the end.

  13. "I have also been asked to replace Yoruba with Senegambian Mandinka samples to check for potential differences. This is something that I had already checked, but I didn't notice any difference in either the Berber percentage in Puerto Ricans or in their homogenization, which indicated a recent founder effect."

    Thanks for letting me know! I was wondering if perhaps such Senegambian samples would make for a more suitable proxy for Tropical African DNA among North Africans as well as Puerto Ricans with minor SSA. But it doesn't seem to make much difference indeed, going by the adjusted results you showed me.

    In my survey findings based on AncestryDNA results for Puerto Ricans I actually also tried to account for likely substructure by subdividing my sample group in persons with minor SSA (<25%) and more pronounced SSA (>35%). Aside from the pronounced North African scores I also found it very interesting to see that for Puerto Ricans with a greater degree of SSA lineage other African regions tend to be more important, such as "Nigeria" and "Cameroon/Congo".

    While for Puerto Ricans with <25% SSA the regional scores for "Senegal" and "Mali" were much more predominant. I have hypothesized that this could indicate a Upper Guinean founding effect. Which would actually also be in line with the findings of Moreno-Estrada et al.(2013)!

    I first blogged about this in 2015 but I intend to delve into this topic again this year. Also extending my analysis to other Hispanic results, especially for Mexicans and Central Americans, among whom such an Upper Guinean founding effect also seems to be very apparent. See also:




  14. This comment has been removed by the author.

  15. " Independent research just published HERE shows that about half of the ancestry of Puerto Ricans is Berber (North African) rather than Iberian and thus that the Guanche or Canario ancestry hypothesis posited by Charles is almost certainly correct. "

    I wonder how any future research into autosomal DNA matching patterns might corroborate or refine these findings. Research conditions might not be ideal yet. But let's assume a customer database exists with 1000 Canarian DNA testers (preferably with increased Guanche lineage) as well as 1000 Spanish DNA testers (preferably from Spanish regions historically known to have been most involved in the colonization of the Hispanic Carribean).

    When compared with a representative sample group of 1000 Puerto Rican DNA testers would one then indeed observe a nearly equal number of DNA matches between Puerto Ricans and Canarians on the one hand and between Puerto Ricans and Spaniards on the other? And would the average size of shared DNA indeed also be nearly equal for the matches between Canarians and Puerto Ricans when compared with the matches between Puerto Ricans and Spaniards?

    Or will additional and more complex ancestral scenarios complicate any straightforward comparison?

    Also I'm wondering if maybe the time framing of absorption of either Canarian or Spanish lineage among Puerto Ricans would be a major factor? Manifesting itself also in such DNA matching patterns. A parallel to be drawn perhaps with the fascinating time framing applied in Moreno-Estrada et al.(2013).

    1. You can replicate the exercise at home, all the software and samples are public domain and gratis. It just has a mild learning curve that should be no major issue for any interested person with a bit of patience.

      Of course the seal of approval of formal peer-reviewed research would add some clout of greater credibility but this is as scientific as it gets, including the possibility of replication by anyone. The more the merrier anyhow, so I do hope for more people independently researching this issue in the future, as it has some interest for Puerto Ricans and other populations (Canarians, Berbers, other Caribbean peoples that might have similar results, etc.)

    2. Hello Fonte,

      Thanks for your input. It’s good question, but it’s really hard to say. This notion of “matching” with the data of someone else, is unclear.

      What is the methodology behind the so-called science which helps to determine how testers match with each other? It’s an unknown field for me.

      For examples, I have a known relative who tested and who shares 32 centimorgans with my data. A decent number of other unknown testers (most likely Fulani), possibly from other countries, share higher values with my data. It’s confusing.

      Genetic studies, usually don’t sample as many people from 1 ethnic group or 1 geographic area. They would possibly use alder or malder in order to detect significant admixture signals, and ultimately deduce time estimates.
      And possibly qpAdm and f4/D stats for admixture models.


      I think that there are really 2 different approaches to population genetic. And this explains why there is so much confusion, it seems.

      The methods and tools that are used by private commercial DNA companies and academic scientific studies are different, because the expectations aren’t the same.

      Commercial DNA deals with recent and private genealogy. It has to be more “palpable” for the customer base, so it won’t get in the depth of admixture analysis. Just enough to have discernible continentally and/or regionally separated categories. Of course, blog posts like yours add values to it because you collect results and look for patterns.

      On the other hand, academic studies that are produced by geneticists are not designed for this sort of individualized and recent genealogical background. They explore and analyze admixture more deeply with a much older time span.

    3. @Fonte
      Thanks for the ideas that you shared with me privately, about exploring “matching patterns” between samples. I familiarized myself with some notions from various sources. PLINK v1.07 has an old section aimed for pairwise IBD estimation and the detection of shared segments.
      I simply reused the samples from the analysis. So 100 Iberians, 70 Puerto Ricans, 17 Canarians, 17 Andalusians and 17 Galacians. Shared SNPs between Puerto Ricans and Iberians for the analysis are above 150K, while shared SNPs between Puerto Ricans and Canarians/Andalusians/Galacians is around 40K. I set a minimum threshold of 5 cM and allowed for all pairs to be reported.

      As expected, the counts of the number of pairs sharing cumulative IBD are the highest between Puerto Rican samples. In terms of frequency, it appears that single segments are prevalent between Puerto Ricans and the rest of the Iberian and Canarian samples.

      Probably due to the disparity between the number of Iberian and Canarian samples, only 4 Canarians match with 3 Puerto Ricans. In spite of this, intriguingly, one of those matching segments between 1 Puerto Rican and another Canarian, is 51cM. It the highest of the entire analysis. According to various sources, the larger the segment the closer the relationship, while high frequency shared segments are more likely to indicate distant sharing at the population level.

  16. Just for the record, the basic instructions on how to use Admixture were explained by Razib Khan in this 2011 entry: http://blogs.discovermagazine.com/gnxp/2011/03/analyzing-ancestry-with-admixture-step-by-step/

  17. I remember when I first checked this link. Unfortunately, many links of Khan's post had already expired. I couldn't even find the dataset for the populations.
    So I skipped it, and went straight to use the manuals from ADMIXTURE, PLINK, and R.
    I would suggest to check the links of the appendix section from this post.

    This site has many datasets (incl. from Lazaridis): https://reich.hms.harvard.edu/datasets

    For ADMIXTURE: http://software.genetics.ucla.edu/admixture/

    1. Thanks for explaining that, Thierno. I must admit I gave you that lead and then you figured out all the rest on yourself. So sure: better follow your references.

    2. *"On your own" (I don't think "on yourself" is a valid English phrase, sry).

  18. Article got a new update or "follow up" by Thierno.

  19. Nassim Taleb et al. just published a new version of their paper called “Informational Rescaling of PCA Maps with Application to Genetic Distance” which aims to improve the underlying mathematical statistics of genetic distances. The distances between populations in conventional PCA are significantly changed with this entropy based method. Hundreds of thousands of published admixture results could potentially be revised under this new method.
    Link: https://arxiv.org/pdf/2303.12654.pdf

    Here’s an interesting finding regarding Latin American samples from the 1000 Genomes Project:

    “For example, for Puerto Ricans (PUR) and Colombians (CLM) conventional PCA spreads them along the beginning of the African cline, whereas rescaling shows them in the vicinity of other Latin American populations (Mexicans and Peruvians). Iranians, Turkish, Palestinian, Druze, French, Iberian (IBS), British (GBR), Russian, Finnish, Puerto Rican, and the majority of Colombians all form a much tighter cluster in the rescaled PCA, indicating that these populations are not as far from each other as the conventional PCA suggests.”


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).