Guest article by Thierno
A discussion on a study on Caribbean autosomal ancestry from 2013 by Andrés Moreno Estrada et al., "Reconstructing the Population Genetic History of the Caribbean,” was posted on this blog:
There were two important elements of information to consider from said post.
1) The ADMIXTURE graphs displayed a "black" component, largely found in Caribbean Admixed Latinos but only poorly represented in South Europe, which suggested a "recent" founder effect some 500 years ago. [Note: "black" here refers to color coding of an autosomal component in Moreno 2013, not to Tropical African ancestry, that was color-coded as "green", please follow the link above for more details].
2) An interesting and informative discussion between a Puerto Rican named Charles, in search of his ancestry, and another blogger named Maju shed light on the little-known historical contribution of Canarian aboriginal Guanches (Berber) to the colonization of America. It is often referred to as the "Tributo de Sangre" (Blood Tribute).
They concluded that the "black" component which was displayed on the ADMIXTURE graphs of the study most likely had a North African origin, by way of Canarian aboriginal Guanche ancestry.
This graph represents the stacked bar-plot of an unsupervised ADMIXTURE exercise which is aimed at studying the complex and intricate ancestral components of Puerto Ricans from Puerto Rico, based on samples that were collected from the 1000 Genomes panel.
The choice of populations that are represented in these ADMIXTURE graphs was firstly made to account for the major, historically known contributors to the Puerto Rican population: Iberians, indigenous Caribbeans, and former African slaves who are, respectively, represented by the "Maya" and "Yoruba" samples.
Secondly, the presence of the merged North African samples in the dataset of these ADMIXTURE graphs serves as a formal test of comparison with the Iberian population in order to verify the aforementioned hypothesis.
The graph for K=4 clearly shows the "light-blue" component, represented in the Puerto Rican (PUR) samples, in addition to their Iberian (red), “Maya-like” (green), and “Yoruba-like” (purple) contributions.
The "light-blue" component is largely restricted to the North African population and also mostly found in the Saharawi samples, making it a "Saharawi-like" component. In other words, it is the identifiable North African component of this ADMIXTURE exercise.
This finding contrasts with the typically much lower North African scores of Hispanic Caribbeans that are reported in commercial autosomal DNA tests. I suspect that the use of Mozabite samples as proxies for North African may conflate their Berber ancestral component with the Iberian ancestral side of their complex genetic makeup.
I included Canarian samples because they still display a minor distinct variation of North African admixture relative to Iberians, although it is important to keep in mind that individuals from those samples, as well as present-day Canarians, are more similar to Iberians from an autosomal genetic standpoint. Moreover, studies that were done on Canarian autosomal DNA have shown disparities in the amount of Guanche (Berber) admixture among individuals who are located in different Islands of the archipelago. Canarians from La Gomera seem to have retained the most Guanche ancestry.
Maju had a blog post about a paper on the estimates of Guanche or Berber genetic influence of Canarians here:
Hypotheses made in the recent past about a possible genetic link between Canarian aboriginal Guanches and Puerto Ricans, on the basis of the unknown role that the Canary Islands have played in the colonization of the Americas, are supported in these unsupervised ADMIXTURE runs. Hypothetically, they could have similar implications for some Admixed Latinos and specific Caribbean communities, but most notably for Hispanic Caribbeans.
Reasons for investigating this issue
I am a person of Fula descent. I wasn't predisposed to experiment on this issue, in the sense that I have a different ethnic history. With the help of the software ADMIXTURE, I decided to use my autosomal data and compare it with publicly available datasets, which include populations that are compatible with my genetic history. In addition to my Fula-specific and West African ancestral components, which were similarly detected in the populations studied by Henn et al. in 2012, I also scored a North African percentage.
I had noticed that my data matched up considerably with "New World" Afro-descendants but also, very intriguingly, with a large number of Hispanic Caribbeans.
At first, I attributed it to the fact that West Africa was a region from which slaves were sent to the Americas.
However, when I tried to identify what specific ancestral components I share with some of those Hispanic Caribbean matches, a common restricted Northwest African ancestry seemed to emerge as a pattern with several of them. After reading the blog-post of Maju on Caribbean autosomal ancestry - several years after he posted it - and the possible Northwest African hypothesis of Hispanic Caribbeans, I figured I would try to verify it and maybe, at the same time, manage to elucidate some of my questions.
[Note (update Dec-31-2020): the sharing of this very drifted (PUR) component between the
complex admixed Puerto Rican samples and my sample is difficult to
interpret precisely, and from a historical standpoint, as the Lawson et
al. paper makes very clear (please see the last update from 2 years
ago). Comments below are clues for follow-up research.]
The graph for K=5 indicates a green specific homogenization of the Puerto Rican samples (PUR) in comparison to the other populations, which suggests a recent founder effect that most likely took place over the past few centuries.
Very intriguingly, my North African component is replaced by this PUR specific component instead of the yellow North Africa. It suggests that the "Guanche-Berber" side of the Puerto Ricans overlaps with my Northwest African component.
I would say that it indicates some complex genetic links between the Guanches and, possibly, other Northwest African populations.
I hope that these unsupervised ADMIXTURE exercises can be of help to those interested in the autosomal genetic links between Hispanic Caribbeans and Canarian aboriginal Guanches.
I used publicly available datasets to perform these ADMIXTURE exercises.
The first one contains a combined dataset of populations from both the 1000 genome project and HGDP unrelated samples, for a total of 162,645 SNPs. It has been filtered and re-arranged by its contributors who are Peter Carbonetto and Amir Kermany.
It belongs to the Ancestry DNA workshop on Github.com.
All the repositories can be accessed here: https://github.com/Ancestry
It was publicly available until a year ago and was utilized during the Computational, Evolutionary and Human Genomics (CEHG) Symposium.
The PUR (Puerto Ricans in Puerto Rico), IBS (Iberians from Iberia), the Maya and The Yoruba samples were selected from this dataset.
The second dataset is from the Henn et al. study from 2012, “Genomic Ancestry of North Africans Supports Back-to-Africa Migrations.” It contains the North African samples that I used for the exercises. I merged them with the dataset that contains the PUR samples, and intersected 44,804 SNPs.
This is the link to access it: http://biologiaevolutiva.org/dcomas/north-african-affy-6-0-data-henn-et-al-submitted/
The third dataset is from the Botigué et al. study from 2013, “Gene flow from North Africa contributes to differential human genetic diversity in Southern Europe.”
It has Spain_S (Andalucians), Spain_NW (Galacians), and Canary Islanders. I also intersected 44,804 SNPs with the first and the main datasets.
The link to access it is here: http://biologiaevolutiva.org/dcomas/north-african-affy-6-0-data-henn-et-al-submitted/
I used the software PLINK to update the physical and genetic positions of the SNPs from the second and third dataset, in order to properly merge them with the ones from the first dataset. I also made sure to merge only SNPs that were already found in the selected dataset (1000 genome and HGDP).
Lastly, I intersected my personal data with the dataset (1000 genome and HGDP), for a total of 161,764 SNPs.
The software ADMIXTURE was used to estimate ancestry.
R was used to plot the estimates.
Update (Oct 30th):
I would like to briefly elaborate on the sampling strategy. The first ADMIXTURE runs that I produced contained additional continental European populations, as well as other West Asian samples. The display showing the distinctive ADMIXTURE coded colors between North African and European samples of the dataset appeared at higher K values, with their respective higher standard errors of the cross-validation error estimate.
I had asked for Maju’s insight on admixture analyses in the past, as I was interested in how his posts on West African and Berber genetics related to my personal autosomal DNA. I did the same for this analysis.
I followed Maju’s recommendations to limit the selection of the reference population to be analyzed to just 4: Iberians, West Africans, Mayans, and Northwest Africans. This resulted in the clear and distinctive display of Berber and Iberian components, starting at K=4 which has a lower standard error. I later added Canary Islander samples.
Note: I have also been asked to replace Yoruba with Senegambian Mandinka samples to check for potential differences. This is something that I had already checked, but I didn't notice any difference in either the Berber percentage in Puerto Ricans or in their homogenization, which indicated a recent founder effect.
Update March 14th 2019:
After this article was posted last October, I received a lot of interesting feedback on the admixture analyses and suggestions for different ancestral contributions of Hispanic Caribbeans, both in private messages and in the comment section of this post/both publicly and privately. In light of this, I would like to go over some aspects of the analysis again.
A note of caution in the interpretation of estimates
The estimates of the clusters from ADMIXTURE are not to be interpreted literally. The different ancestral k components are not “real” populations. They are designed to help identify differentiation between populations.
Both supervised and unsupervised analyses will produce FSTs between the designated populations or between the clusters. They serve to evaluate “approximately” possible genetic variations. In this type of analysis, as we can observe in the graphs contained in this post, moderate amounts of the components that are less divergent from each other overlap across populations which share lower FSTs. Considering that FSTs between North Africans and West Eurasians is usually around 0.06, there will inevitably be a shared overlapping effect. As a result, it isn’t possible to obtain a very precise delineation between North African and Iberian samples. So, essentially, this is an evaluation of variation and not an accurate system of measurement.
Intuitively, it seems that analyses which contain populations or clusters that are separated with higher FSTs will be more robust. It also seems that when FSTs fall below 0.05, the degree of differentiation in the displayed clusters is difficult to evaluate or make sense of. This may explain why analyses of intra-European/Mediterranean populations with FSTs that are around 0.01 are difficult to evaluate with ADMIXTURE. Other steps can be taken to mitigate the effects of linkage disequilibrium, as was the case for the dataset that was used for the analysis in this post.
ADMIXTURE works better for recently admixed groups who derive their ancestry from distinct populations. For obvious historical reasons, African Americans and Hispanic Americans have recent ancestries that most admixture analyses can detect fairly well.
Evidently, the total complexity and chaotic processes of ancient migrations which are not static, but rather dynamic cannot realistically be captured by ADMIXTURE. The complete reconstruction of such patterns on the basis of present-day populations would obviously be misleading.
Daniel J. Lawson, Lucy van Dorp and Daniel Falush wrote a paper called, “A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots” (2018) in which they warned against some of the pitfalls of admixture analyses.
While it’s not possible to make exact predictions from tools that are used in the field of population genetics, when interpreted correctly, some interesting information can still be extracted from various analyses.
Previous research on the possible Canarian legacy in America, including the examination of historical records, had been conducted prior to the publication of the Moreno et al. (2013) paper. With regard to the genetic affinity of the aboriginal inhabitants of the Canary Islands, a similar analysis was done more recently on the autosomal DNA of ancient Guanche samples that may have lived between the 7th and the 11th century and is discussed in a paper by Rodrı́guez-Varela et al. (2017). The authors conclude that a Northwest African-specific ancestry component makes up the majority of their autosomal ancestry, as well as other Berber populations from North Africa. Additionally, Y-DNA E1b-M81, which is found at high frequencies in Northwest Africa, was also detected in these samples.
In the study from Arauna et al. (2016) on how “Recent Historical Migrations Have Shaped the Gene Pool of Arabs and Berbers in North Africa,” the authors expressed doubts about the use of Mozabite samples as the sole proxy for North African genetic diversity.
Considering that the paper from Moreno et al. didn’t have North African samples, the focus of this post was to explore potential variations by including Northwest African samples such as Moroccans and Saharawis.
Naturally, to exactly what extent inhabitants of the Canary Islands – whose gene pool could have already been affected by the DNA of Iberian settlers - may have impacted the genetic pool of Hispanic Caribbeans is a question which would require further and more diversified analyses.
Several studies have reported mtDNA L(xM,N) among various Latin American communities. They strongly suggest recent African ancestry in the context of the recent colonization of the New World. The uncommon L(xM,N) lineages that have formed variant specific subclades which are not native to Africa but rather found in other continents or regions are extremely rare, it seems.
In 2012 Cerezo et al. published a paper on subject which is titled, “Reconstructing ancient mitochondrial DNA links between Africa and Europe.”
Another study, published by Ricardo Rodriguez-Varela and his colleagues, is called “Genomic Analyses of Pre-European Conquest Human Remains from the Canary Islands Reveal Close Affinity to Modern North Africans.”
More recently, a paper called “Mitogenomes illuminate the origin and migration patterns of the indigenous people of the Canary Islands” was published by Rosa Fregel with the mtDNA sequencing of 48 ancient individuals. Out of all of the L(xM,N) lineages that were analyzed, only the newly defined L3b1a12 was identified as a new Canarian-specific lineage.
It appears that European and Canarian autochthonous mtDNAL(xM,N) lineages form subclades which correspond to specific mutations that are less likely to be found in Africa.
In the case of Puerto Ricans, there was a project from National Geographic called “Genographic Project DNA Results Reveal Details of Puerto Rican History” (2014). After sampling 326 individuals from southeastern Puerto Rico and Vieques, they found that 80% of Puerto Rican men carry West Eurasian (or European) Y-DNA paternal lineages, while 60% of Puerto Ricans carry maternal lineages of Native American origin. This may shed some more light on the findings of Moreno et al., (2013), who wrote of the “Latin-European” component which seemed to indicate a founder effect.
In contrast, it would be interesting for future research to sample Hispanic Caribbean communities where African ancestry may have been retained in higher proportions and, in the process, collect more mtDNA and Y-DNA.