October 11, 2018

Major Guanche genetic influence in Puerto Ricans (guest article by Thierno)

Guest article by Thierno


A discussion on a study on Caribbean autosomal ancestry from 2013 by Andrés Moreno Estrada et al., "Reconstructing the Population Genetic History of the Caribbean,” was posted on this blog:


There were two important elements of information to consider from said post.

1) The ADMIXTURE graphs displayed a "black" component, largely found in Caribbean Admixed Latinos but only poorly represented in South Europe, which suggested a "recent" founder effect some 500 years ago. [Note: "black" here refers to color coding of an autosomal component in Moreno 2013, not to Tropical African ancestry, that was color-coded as "green", please follow the link above for more details].

2) An interesting and informative discussion between a Puerto Rican named Charles, in search of his ancestry, and another blogger named Maju shed light on the little-known historical contribution of Canarian aboriginal Guanches (Berber) to the colonization of America. It is often referred to as the "Tributo de Sangre" (Blood Tribute).


They concluded that the "black" component which was displayed on the ADMIXTURE graphs of the study most likely had a North African origin, by way of Canarian aboriginal Guanche ancestry.

K=4

This graph represents the stacked bar-plot of an unsupervised ADMIXTURE exercise which is aimed at studying the complex and intricate ancestral components of Puerto Ricans from Puerto Rico, based on samples that were collected from the 1000 Genomes panel.

The choice of populations that are represented in these ADMIXTURE graphs was firstly made to account for the major, historically known contributors to the Puerto Rican population: Iberians, indigenous Caribbeans, and former African slaves who are, respectively, represented by the "Maya" and "Yoruba" samples. 

Secondly, the presence of the merged North African samples in the dataset of these ADMIXTURE graphs serves as a formal test of comparison with the Iberian population in order to verify the aforementioned hypothesis.

The graph for K=4 clearly shows the "light-blue" component, represented in the Puerto Rican (PUR) samples, in addition to their Iberian (red), “Maya-like” (green), and “Yoruba-like” (purple) contributions. 

The "light-blue" component is largely restricted to the North African population and also mostly found in the Saharawi samples, making it a "Saharawi-like" component. In other words, it is the identifiable North African component of this ADMIXTURE exercise. 

This finding contrasts with the typically much lower North African scores of Hispanic Caribbeans that are reported in commercial autosomal DNA tests. I suspect that the use of Mozabite samples as proxies for North African may conflate their Berber ancestral component with the Iberian ancestral side of their complex genetic makeup.

I included Canarian samples because they still display a minor distinct variation of North African admixture relative to Iberians, although it is important to keep in mind that individuals from those samples, as well as present-day Canarians, are more similar to Iberians from an autosomal genetic standpoint. Moreover, studies that were done on Canarian autosomal DNA have shown disparities in the amount of Guanche (Berber) admixture among individuals who are located in different Islands of the archipelago. Canarians from La Gomera seem to have retained the most Guanche ancestry.

Maju had a blog post about a paper on the estimates of Guanche or Berber genetic influence of Canarians here:


Hypotheses made in the recent past about a possible genetic link between Canarian aboriginal Guanches and Puerto Ricans, on the basis of the unknown role that the Canary Islands have played in the colonization of the Americas, are supported in these unsupervised ADMIXTURE runs. Hypothetically, they could have similar implications for some Admixed Latinos and specific Caribbean communities, but most notably for Hispanic Caribbeans.


Reasons for investigating this issue


I am a person of Fula descent. I wasn't predisposed to experiment on this issue, in the sense that I have a different ethnic history. With the help of the software ADMIXTURE, I decided to use my autosomal data and compare it with publicly available datasets, which include populations that are compatible with my genetic history. In addition to my Fula-specific and West African ancestral components, which were similarly detected in the populations studied by Henn et al. in 2012, I also scored a North African percentage.

I had noticed that my data matched up considerably with "New World" Afro-descendants but also, very intriguingly, with a large number of Hispanic Caribbeans.

At first, I attributed it to the fact that West Africa was a region from which slaves were sent to the Americas.

However, when I tried to identify what specific ancestral components I share with some of those Hispanic Caribbean matches, a common restricted Northwest African ancestry seemed to emerge as a pattern with several of them. After reading the blog-post of Maju on Caribbean autosomal ancestry - several years after he posted it - and the possible Northwest African hypothesis of Hispanic Caribbeans, I figured I would try to verify it and maybe, at the same time, manage to elucidate some of my questions.

K=5

The graph for K=5 indicates a green specific homogenization of the Puerto Rican samples (PUR) in comparison to the other populations, which suggests a recent founder effect that most likely took place over the past few centuries. 

Very intriguingly, my North African component is replaced by this PUR specific component instead of the yellow North Africa. It suggests that the "Guanche-Berber" side of the Puerto Ricans overlaps with my Northwest African component. 

I would say that it indicates some complex genetic links between the Guanches and, possibly, other Northwest African populations.

I hope that these unsupervised ADMIXTURE exercises can be of help to those interested in the autosomal genetic links between Hispanic Caribbeans and Canarian aboriginal Guanches.

Thierno


Appendix


I used publicly available datasets to perform these ADMIXTURE exercises.

The first one contains a combined dataset of populations from both the 1000 genome project and HGDP unrelated samples, for a total of 162,645 SNPs. It has been filtered and re-arranged by its contributors who are Peter Carbonetto and Amir Kermany.

It belongs to the Ancestry DNA workshop on Github.com.

All the repositories can be accessed here: https://github.com/Ancestry

It was publicly available until a year ago and was utilized during the Computational, Evolutionary and Human Genomics (CEHG) Symposium.

The PUR (Puerto Ricans in Puerto Rico), IBS (Iberians from Iberia), the Maya and The Yoruba samples were selected from this dataset.

The second dataset is from the Henn et al. study from 2012, “Genomic Ancestry of North Africans Supports Back-to-Africa Migrations.” It contains the North African samples that I used for the exercises. I merged them with the dataset that contains the PUR samples, and intersected 44,804 SNPs.


The third dataset is from the Botigué et al. study from 2013, “Gene flow from North Africa contributes to differential human genetic diversity in Southern Europe.”
It has Spain_S (Andalucians), Spain_NW (Galacians), and Canary Islanders. I also intersected 44,804 SNPs with the first and the main datasets.


I used the software PLINK to update the physical and genetic positions of the SNPs from the second and third dataset, in order to properly merge them with the ones from the first dataset. I also made sure to merge only SNPs that were already found in the selected dataset (1000 genome and HGDP).

Lastly, I intersected my personal data with the dataset (1000 genome and HGDP), for a total of 161,764 SNPs.

The software ADMIXTURE was used to estimate ancestry.

R was used to plot the estimates.

April 28, 2018

Video: Do genes make you fat?

I don't usually deal with the medical aspects of genetics but this conference by Giles Yeo is so enticing and clarifying that I believe it deserves an entry here:



April 8, 2018

Luxmanda: a 3,000 years-old proto-Horner in Tanzania

I knew, more intuitively than rationally, that the Horner (Ethiopian, Somali, Eritrean) type of Afro-Eurasian admixture was very old but no idea it was so much. I knew that West Eurasian Upper Paleolithic had an impact on Africa (LSA) but I did not know it went so deep South nor that it had left such a massive legacy as ancient DNA reveals.

Pontus Skoglund et al. Reconstructing Prehistoric African Population Structure. Cell 2017 (open access). DOI:10.1016/j.cell.2017.08.049

The data analysis speaks volumes by itself:

Figure 1 - Overview of Ancient Genomes and African Population Structure


Figure S2 - Ancient Individuals and African Population Structure

Figure 2 - Ancestral Components in Eastern and Southern Africa

We show bar plots with the proportions inferred for the best model for each target population. We used a model that inferred the ancestry of each target population as 1-source, 2-source, or 3-source mixture of a set of potential source populations.


So much that I don't really know what else to say. Of course this is just a sample of what there is in the paper, read it. I'm sure there will be plenty of comments even if the study was published months ago.

Regarding haploid DNA, I don't see anything outstanding but, as I know there is generally quite a bit of interest, these are screenshots of the ancient lineages found (full data in the supp. materials of the study):

Ancient Y-DNA (screenshot)
Ancient mtDNA (screenshot)

Related: No Iberian in Iberomaurusian.

Correction: I first titled this "a 30,000 years-old...". That was a major error on my part and I apologize for any confusion it may have caused. Thanks to Capra Internetensis for spotting it.

No Iberian in Iberomaurusian

After almost a century of controversy on the matter, it seems that archaeogenetics solved the riddle. Not in the sense I thought it would but it did anyhow.


Ancient DNA samples from Taforalt (Iberomaurusian or Oranian culture, Upper Paleolithic of North Africa) show no trace of Paleoeuropean ancestry (WHG), however they show strong affinity to West Asians of Palestinian type, showing also some significant amount of African Aboriginal ancestry, probably closer to East African Hadza and Sandawe and ancient Mota than to West African types. The result is something roughly similar to Afars but not quite the same in any case. 

Fig. S8 - Taforalt individuals on the top PCs of present-day African, Near Eastern and South European populations.


Fig. S11 - ADMIXTURE results for a few informative K values.

So the conclusion must thus be that the Eurasian influence in North African Upper Paleolithic (call it Iberomaurusian, Oranian or my personal unorthodox preference: Taforaltian) arrived from West Asia with whichever intermediate stage in Egypt and Cyrenaica, where that influence is quite apparently much older in the archaeological record. This seems contradictory to the chronology of Taforaltian, with Western sites producing older radiocarbon dates but the genetic data seems overwhelming. 

I must say I wish they would have contrasted with older (and available) Paleoeuropean samples than WHG (Epipaleolithic) but I guess that some WHG influence would have shown up if there was some older European influx because the various Paleoeuropean layers are not disconnected. But it is still something someone should test, just in case. 


Haploid DNA


The Taforalt sample was rich in mtDNA U6a, with also one instance of M1b:


All six male samples carried Y-DNA E1b1b, with most of them being well defined as E1b1b1a1-M78 (see table S16 for details).


Related: Luxmanda: a 30,000 years-old proto-Horner in Tanzania.

April 4, 2018

North African Neolithic was influenced by Europe... and European Chalcolithic by Iberian Neolithic

Or so it seems considering the data of Fregel et al., a study I have in my to-do list for some time and that I don't see cited often or ever at all.

Rosa Fregel et al., Neolithization of North Africa involved the migration of people from both the Levant and Europe. BioRxiv 2017 (pre-pub). DOI:10.1101/191569

The critical piece is probably this selection from Admixture results but which repeats over and over through the study with many more analyzed populations from all West Eurasia and North Africa:


We see how KEB (Morocco Neolithic) is a mix of European Neolithic intermediate between Iberia (purple) and Sardinian (blue) on one side and, on the other, something like Mozabites (not shown in this detail, cream). TOR is a new Neolithic sample from Andalusia.

Another ancient Moroccan sample IAM (pre-Neolithic, not shown here either) is fully cream-colored like mostly are modern Mozabites. 

Interestingly we see for the first time the emergency of a purple-colored component that differentiates Iberian Early Neolithic from the rest (although this does not happen at lower K-values, so they are still related), a component that, in the MNChL (Middle Neolithic and Chalcolithic) period, somehow appears as dominant in Italy (no data for earlier times) and becomes quite dominant in Central Europe. 

This is intriguing to say the least. It must be said that modern Sardinians and Basques (these probably, not labeled) are low in the purple component, although less than other populations, and that somehow the Early Neolithic (blue) component made a comeback:



I do not want to over-interpret all this (autosomal genetics are not an exact science) but, judging on KEB, the purple component is not just a generic southern branch (Cardium Pottery) distinction but something specifically Iberian or Italo-Iberian. The matter needs more research but it is in any case very intriguing that the purple component seems to expand from Iberia or somewhere nearby (France?, Italy?) in the period leading to the Chalcolithic, a most critical one in the formation of the genetics of Europe.

There is a also a little hoard of DNAmt and Y-DNA, with G2a-M201 (in Europe), E1b-L19* (in pre-Neolithic North Africa) and T-M184 (in Neolithic North Africa) in the patrilineal side and quite a bit of varied K1a in the matrilineal one, as well as JT (also in both shores) and U6 and M1 in North Africa.

Worth reading and keeping in mind, no doubt.

March 31, 2018

Iberian genetic clusters

I've been the last two weeks or so chewing on this pre-pub and there's a point when no more chewing seems to be useful. So let's get to discuss it as well as possible.

Clare Bycroft et al. Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula. BioRxiv 2018 (pre-pub. DOI:10.1101/250191

The key finding is clustering of the populations of the Iberian Peninsula as in this map (locations for the Spanish state are precise, for Portugal unknown and located at random, also shadowing for Portugal is uniform for all the country):

Supp. Figure 1a

The weirdest thing for me is that the Catalan-Alacant and Seville-León-Asturias cluster are strongly related in the cladogram. I'll discuss this below.

Another very weird feature is the presence of a group in Pontevedra province (Galicia) that is the most different of all, even more distinct than Basques. It is composed of many small highly endogamous subgroups. I do not have at the moment any explanation for this, honestly.


External influences: mostly "French"


When factored as made up external populations, Iberians are mostly French (or something that approaches that label), although "mostly" varies from c. 60% in the West to c. 90% in Gipuzkoa. This pattern of "Frenchness" reminds that of the distribution R1b-S116. Correlation is not causation but it is still correlation and when R1b-S116 seems to stem from somewhere France and arrive to the Peninsula at least as early as the Bronze Age (or maybe before but still undetected, terminus ante quem at Los Lagos, as discussed recently). 

Supp. Fig 5a

The most affected population by this French influence are Basque1, which show no significant contribution from any other source (only very small from Italy1 and very tiny from North Morocco, see supp. fig. 7) but the authors say that (supp. info.):

Notably, the Basque-centred cluster has a markedly different profile from the rest. Firstly, it has much lower, or zero contributions from donor groups that contribute to all other clusters: Italy, NorthMorocco, and WesternSahara, and a very large contributionof 91% (88-93) from France. Additionally, the model fit for this cluster is strikingly less good than that for the other clusters (Supplementary Figure 4d), suggesting that Basque-like DNA is less well captured by the mixture of donor groups in this data set. Specifically the Basque share even more DNA with the French group than predicted by their mixture representation, which might reflect, for example, that the DNA the Basque share with present-day French is only a subset of modern French ancestry. This pattern is seen for other Spanish groups also, but to a much lesser extent.

Area that demands urgent genetic research

So it seems we may be dealing with some sort of "paleo-French" rather than modern Indoeuropeanized French. 

All genetic roads lead to France, at least in Western Europe: it also happens in Great Britain and Ireland, and it is very apparent in the geographically sorted phylogeny of R1b-S116. And is also this area where we see the earliest signs of mitochondrial DNA "modernity": in Paternabidea (Navarre) and Gurgy (Burgundy), an area that demands much greater attention from genetic and archaeogenetic research than has received to this day. 









The other major contributors are: Italy (mostly Italy1), with peaks of c. 20% and influencing mostly the South and Center, North Morocco, with peak of c. 10% in Portugal and a West and South distribution, and Ireland, with peak of c. 6% in Eastern Asturias and a broadly Western distribution. 


Italian contribution (Italy1)


North Morocco contribution
Ireland contribution













What do exactly these contributor components mean? Hard to say, although part of the Italian and North Moroccan elements could well be related to historical episodes such as Roman and Muslim conquests. But only partly so,because the North African in Galicia just cannot be that high only from a Muslim conquest that was very limited in time, nor should we expect to be that much "Muslim" nor "Roman" in the remote and largely ignored area of modern Portugal: there must be more ancient origins, probably dating to the Neolithic, Chalcolithic or Bronze Age. 

minor West Sahara contribution
And in the case of the North African component we may have a guide in a minor West Saharan contribution (at right), which may well reflect an older and "purer" form of North Africanness and which is againcon centrated in Portugal and Galicia, with extension to parts of the Central Plateau but does not affect the South, the area where we should expect most of the Muslim period's genetic influence. 

We cannot trace a line in Portugal because of the uncertainty of the geographic origins of the samples but we can do it within the boundaries of Spain and that line suggests that the Muslim genetic influence could be intense by the Southern third and maybe all the way to Zamora by the Western part but should not be relevant in Galicia nor Asturias nor (inferred, uncertain) much of Portugal. That in these areas, the North African element is peculiar and looks older than the Emirate/Caliphate of Cordoba. 

Speculating on the possible origins of the Iberian clusters


This part has given me a true headache. It is very hard to understand how these clusters formed and I will not pretend here that I have all the answers. The most strange of all is the affiliation of the Central-West and Eastern clusters. 

The problem is not only the highly implausible relation between Asturias-León and West Andalusia, which the authors seem to believe product of historical colonization at the time of the Reconquista (13th century) but which makes no sense whatsoever because the Kingdom of Seville was never part of the barely autonomous Kindgom of León but an administrative division of Castile (of which León was by then just a dependency) and we should thus see at least some important influence of the Central (yellow triangles) cluster, which is dominant in Valladolid, Madrid and even the city (but not the countryside) of Burgos, and we do not see anything like that. 

One possibility is of course that the components or some of them are not that real but I do not see any indication of that in the study, so, in wait of independent replication, I'll take them at face value. 

So why then? I've been scratching my head until I could not think any further, I swear. 

And this is my hypothesis, risky as it may be:

1. The essence of the split between the related Spanish components (excluding Galicians and Basques) and the Portuguese-Galician component could be at the Early Neolithic. When I mask the areas not or weakly affected by the Earliest Neolithic in the components map I get this:



... what seems to correspond odly too well to the first major split in the cladogram between the Portuguese-Galician (purple) component and the rest.

2. The expansion inwards may correlate with Chalcolithic and Bronze Age processes, which seem to be way too important everywhere and also in Iberia. So I used the Bell Beaker map I copied from Harrison (see here) as cutoff for another mask (radius are relative to Bell Beaker density circles in the reference map):



If so the split between the Central (yellow) and East (orange) groups (to which the brown and red and other groups are closely affiliated) could be related to this Bell Beaker period and derived Bronze Age cultures. The yellow or Central component could originate in Los Millares (Almería province) and spread northwards to Ciempozuelos (Madrid province) and from there to other areas with the Cogotas I culture of the Bronze Age. 

The Purple (Western) component should be somehow related to Zambujal or Vila Nova de Sao Pedro (VNSP) culture of Portuguese Estremadura and spread northwards to tin-rich Galicia with the group of Montelavar already in the Bronze Age maybe.

The mysterious Red (Central-West) component could be related to some colonization of that area from the Bell Beaker dense area of Catalonia or the Denia district, or maybe even an older colonization, hard to say. What I know of that area in the late Prehistory is that it is ill-defined, partly for lack of research in the heavily farmed alluvial plain, and that it correlates with Southern Portugal but not fully, always showing a distinct personality, until it grows a clearly distinct personality in the Tartessian period, already in the Iron Age. It is also clear that the so-called Silver Road runs straight through that cluster and that it was important, and growingly so, in the Late Prehistory, having both commercial and religious significance and being clearly the main path of penetration of Phoenician influences inland, already in the proto-historical period. 

While still caught with feeble pins, this Silver Road speculative explanation seems to make much better sense than the Reconquista hypothesis the paper appears to spouse and which I see nonsensical because the patterns observed are not as we could expect. 

But of course it is always up to you to make up your own mind, I'm just offering some variant considerations that for me make some sense but that are by no means a well finished theory either, just better than the simplistic historical interpretation, which does not fit the facts too well.

A new genetics blog in Spanish


Wilhelm H > DNA and Genealogy, by some guy called Wilhelm Halys, whom I know from Facebook. 

His very first post on endogamy using RootsFinder seems very interesting, even if I have to admit I don't understand it well because I'm unfamiliar with this program. But that's good because it indicates novelty, fresheness and stuff to learn. 

Hope you also find interesting, assuming you can read Spanish or use a translator.