Spread Zones: Europe and the Neolithic Survivors

I have been away for too long, due to other writing activities that kept me extremely busy. In spite of that long gap, this will not be a new cycle of posts, but the end (at least momentarily) of a series of previous topics. I will, however, touch on a variety of subjects and express my opinion about the most contentious question of European linguistics and archaeology – the timing of the spread of Indo-European languages. My opinion on this problem has been decided (for now) by the evidence provided by Basque and the isolated languages of the Mediterranean.

From the now classic Guns, Germs and Steel to the more recent Prisoners of Geography, the idea that the terrain inhabited by a group largely determines the events of their history has become widespread among the general public. Although reality is much more intricate than that, the same principle is the basis of mosaic x spread zone models: large navigable rivers and extensive flat plains will tend to experience wave over wave of language replacement. In contrast, mountainous areas, islands and other inaccessible terrains tend to be refugia where millennia of uninterrupted development result in a myriad of languages disconnected from large families. Obviously there are exceptions – as in the Andean case that I examined previously.

In any case, it is undeniable (as Russia well knows) that from the Asian steppe to Central Europe there is really no geographical barrier – as long as one keeps south of the Urals. The diffusion of Indo-European languages into Europe did not need to overcome all that distance, moving only from the surroundings of the Caspian Sea to the West (assuming a Yamna origin!). Curiously, the Neolithic diffusion followed a different path altogether, from the Levant, through Anatolia, to the Mediterranean, resembling part of the (much later) silk road. The big question is whether these two events (Neolithic and Indo-European) coincide, as proposed originally by Colin Renfrew. I assumed this theory to be pretty much dead, at least among linguists, but I was mistaken (see Gray and Atkinson’s paper and, more recently, Bouckaert and colleagues’). Fortunately, genetics have been playing a major role in redefining our views about ancient migrations, and I became convinced that genetic evidence does not support a Neolithic age for Indo-European. Nevertheless, given the difficult association between genes and languages, I am sure the matter will continue to be hotly debated for years to come.

Map of the 5th principal component of 94 genes in Europe (from History and Geography of Human Genes)

The Pre-Indo-European languages of Europe (and their speakers) offer formidable clues to the problem. I have written about the isolates of Eurasia in a previous post, but let us explore some more facts about the last speakers of a Pre-Indo-European language: the Basques. Even before the modern DNA studies, the Basques were known to be different from their neighbours. For example, using only blood types, it was noticed that Basques were predominantly O and had the highest incidence of Rh- in Europe. With the first DNA analyses of a large number of European populations, it became clear that Basques were indeed genetically distinct. I have previously shown some maps of principal component analyses, and here I reproduce the map of the 5th principal component for Europe, based on the 94 genes analysed by Cavalli-Sforza and colleagues. This principal component peaks at the Basque country, which is at the opposite extreme from most of Northern/Central Europe and the Balkans, although with some similarity to the remainder of the Iberian Peninsula.

Keeping in mind that the first principal component shows a gradient from Greece to the northwest (Neolithic?) and the second principal component radiates from the north of the Black Sea (Bronze Age/Yamna?), I believe there are only two ways Cavalli-Sforza’s data can be interpreted:

  1. The Basques are Paleolithic/Mesolithic “survivors” (ergo Neolithic migrants spread the Indo-European languages);
  2. They are Neolithic “survivors” (ergo Indo-European languages arrived during the Bronze Age).

By the way, I am assuming here that the massive genetic legacy of the Neolithic and Bronze Age expansions (see below) must have had a linguistic correlate. In theory, one can imagine a situation in which large numbers of migrants arriving at a region, becoming culturally dominant and having children with locals do not imply language replacement, but I would like to see real world examples of that. In fact, it seems that one only needs a small number of “conquering” migrants with minimal genetic impact to change the language of whole regions (that is the case in some Latin American countries and, further back in time, was the case of Hungary).

Fortunately, we have advanced much since Cavalli-Sforza and colleagues’ original work. For example, it is clear now that the Neolithic expansion in Europe did involve population movement, and that migrations from Anatolia were indeed the source. In my opinion, the most significant genetic piece of evidence is the discovery that modern Basques are the closest living population to Neolithic skeletons from the same region. Sardinians were also found to be very close and, in fact, Sardinians appear in every study cited here as the closest modern match to Neolithic DNA samples. Their status as a “relic” population in Europe due to isolation was noticed long ago, when Cavalli-Sforza left Sardinians out of the PCA due to their singularity within Europe. The genetic similarity between Sardinians/Basques and Neolithic samples deserves special attention.


One important aspect that has been taken into consideration recently is the distribution of haplogroups in Europe. Unlike the autosomal data referred to above, which tells us about the admixture in the ancestry of an individual, Y-chromosome and mtDNA haplogroups are specific mutations that are passed down over generations and preserve the histories of migrations of particular male and female lineages. The Y-chromosome haplogroups most closely associated with the Neolithic are E1b1b and G2a. Both of them are rare in modern Europe as a whole, but G2a was the dominant haplogroup among Neolithic farmers of Central Europe, France and Spain. Both E1b and G originate outside of Europe, in the Levant and the Caucasus respectively, and must have been brought to Europe by Neolithic migrants. I will not reproduce the beautiful maps from Eupedia, but in the map above I highlight the areas where E1b and G are nowadays more common. The Mediterranean (including Sardinia) and the Balkans are modern refugia of those two haplogroups, but what happened to Central Europe? In fact, paternal lineages from most of Central and Western Europe were later replaced by haplogroups R1a and R1b, carried by Bronze Age migrants (confirmed by the fact that Yamna samples were recently shown to be R1a). Surprisingly, most modern Basques actually belong to haplogroup R1b (maybe “bottleneck effects” could easily lead to the replacement of Y-chromosome lineages over a few generations in such an isolated population?).

In summary, the Neolithic expansion in Europe involved considerable population movements, the genetic signature of which can still be seen and is most noticeable in the Mediterranean and the Balkans. These areas were somewhat less impacted by the later Bronze Age migrations, which also changed considerably the genetic make-up of Central and Western Europe. The fact is that, since there is no other major language expansion after Indo-European, this event must have occurred during the Bronze Age. Some supporters of the Anatolian hypothesis do not deny that fact, arguing that several waves of Indo-European expansion could have occurred. Using the principle of Occam’s razor, I would immediately discard this explanation as being too complicated. The question then is: would the Neolithic expansion also have involved the diffusion of a single, widespread language family?

Pre-Indo-European languages in relation to the major cultural traditions of the Neolithic.

In the map above, I am showing some of the main Neolithic cultures of Europe in relation to the isolated languages that we know about. The Linearbandkeramik (LBK) has taken advantage of the plains that extend from Ukraine to France – a huge spread zone – and it is difficult not to imagine that it was accompanied by the diffusion of a single language [family] around 7000 years ago. At the same time, the Cardial/Impressed culture spread along the Mediterranean shores. Both ultimately stem from the Greek Neolithic (with clear origins in Anatolia), but seem to be local developments. Whether both involved the expansion of the same language [family], I will not dare to speculate, as there is no clue to what the LBK people spoke (perhaps it is in the substrate of the Germanic languages). But could there be a connection between the other Pre-Indo-European languages recorded in Southern Europe?

That Basque was part of a larger language family in the past is well accepted, but what was its extension? Some have suggested the widespread occurrence of Basque-like elements in the toponymia of Europe: e.g. the connection between Val D’Aran in Spain, Arundel in England and Ahrntal in the Alps would be the Basque word aran “valley”. There have also been some attempts to connect Basque to the extinct Pre-Indo-European language of Sardinia – which we can call “Paleo-Sardinian” but is also known as “Nuraghian”. Beyond genetic isolation, the linguistic isolation of Sardinia is clear even in (relatively) recent times: for example, whereas all Romance languages have turned Classical Latin C into /tʃ/, /ʃ/ or /s/, in Sardinian it is still pronounce as a hard /k/. As for the Paleo-Sardinian language, we have no record of it except for words that entered modern Sardinian or the toponyms in the island. It is based on those that Blasco Ferrer proposes parallels with Basque – e.g. the triad Lur-beltz, Lur-gorri and Lur-zuri meaning “black”, “red” and “white earth” respectively, which appears in the island as Duru-nele, Lúr-kuri and Lu-tzurró.

Among the well-known Pre-Indo-European isolates is the Etruscan language. There is a possibility that it is actually connected to Rhaetic, a language or group of languages preserved on a few inscriptions around the Alps. Given the small corpus, it is unlikely that it will ever be “deciphered”, but formally there are some resemblances with Etruscan – e.g. a common ending -ce or -ke that, in the later, marks the past tense. Another candidate to form a family together with Etruscan is Lemnian, attested in a few inscriptions in the Greek island of Lemnos. If Etruscan, Rhaetic and Lemnian are indeed part of a single family, later fragmented by the spread of Indo-European, then we are potentially dealing with yet another group of Neolithic “relics”. Then, the location of those languages in relation to the Cardial/Impressed culture and to the modern distribution of Y-chromosome haplogroups E1b1b and G2a, as can be seen in the maps above, starts to make sense.

Finally, let us consider the Pre-Indo-European languages (directly or indirectly attested) from the region closest to the origins of the Neolithic. One of the most interesting things about the Greek language is that it absorbed a large amount of non-Indo-European vocabulary. These are wοrds that have no cognates in other Indo-European languages and are also easy to spot based on their distinctive phonology/morphology. Interestingly, they tend to be cultural items – words like σῦκον “fig”, ἔλαιον “olive”, θάλασσα “sea” and βασιλεύς “king” (in Linear B qa-si-re-u). It is not impossible that, like the Linear B syllabary, the substratum in Greek might be related to the language once spoken in Crete and partly preserved in the Linear A script. This language, which we may call Minoan, is probably never going to be fully deciphered given the small size of the corpus and the fact that most of it consists of accounting tablets full of personal names and toponyms. Names of products are written with logograms, so it is impossible to know how they were pronounced. A few transaction words (see below) and inflected forms (like the famous ja-sa-sa-ra-me that I mentioned previously) offer a window into Minoan, but the known vocabulary is so small that the language is destined to remain unclassified.

In summary, it is likely that all these languages are remnants of a once widespread Neolithic family (or families), diffused together with the Cardial/Impressed culture along the Mediterranean. We simply do not have enough material to show how they are related, although a convincing case can be made for Etruscan, Lemnian and Rhaetic. Basque could be related to this phenomenon or be part of a different family spread further west, perhaps in association with the Megalithic traditions of the Atlantic Neolithic. It is almost certain that the LBK expansion brought a single language or language family to their vast territory, but whether it was also related to the Mediterranean variants is impossible to tell.

Example of a Linear A tablet, now at the museum of Heraklion. Logograms are transcribed with latin words (vir for man, fic for fig). Most of the words spelled with the syllabary are personal and place names. In this tablet, you can see some of the exceptions: ki-ki-na seems to be an adjective describing the figs, or maybe it is the word for fig itself repeated after the logogram. Transaction words usually appear as headers or at the end of the tablet. In this example, a-du and ki-ro could mean something like “balance”. Ku-ro is followed by a number that is the sum of all previous items and must mean “total”.

5 thoughts on “Spread Zones: Europe and the Neolithic Survivors”

  1. I absolutely love this, although I personally am much more iffy about linking languages and genetics because of the case of my own home region, the Horn of Africa. There was a recent study by Hodgson & colleagues (2014) that actually disproved the link between Semitic languages and the appearance of Arabian input in the genetic make up of the Horn of Africa, linking the greater majority of the ancestry to around ~24,000 kya. Regardless, the Europe case is so interesting and I really enjoyed this post.


    1. Thanks! This is an interesting case. I don’t know much about the horn of Africa, but I expected Ethiopians to show some recent Middle Eastern admixture, even if minimal. This paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1181965/) found a large percentage of haplogroup J among Amhara speakers. Perhaps the key is to look at ancient individuals. For example, in Hungary the modern population does not differ from other Eastern Europeans, but skeletons from 1000 years ago had haplogroup N, confirming a Siberian ancestry. But you are absolutely right in that language shift can happen without much population displacement. However, I would argue that the opposite is never true: if you can demonstrate that a large population expansion took place, then you can be almost certain that language expanded with it.

      Liked by 1 person

      1. If I’m not mistaken it was found that speakers of Yemsa, a North Omotic language, have higher percentages of J than neighboring Semitic-speakers, which isn’t odd. Haplogroup J itself is debatably like 48,000 years old, with J1 itself being debatably ~24,000 years old. There’s also the case of many Afar-speakers actually having an unusually high percentage of J1 despite being a highly endogamous ethnic group for the most part. I actually have some posts on my blog as to why the Semitic-speakers themselves can’t be the origin, although there’s at least one case of a possible back-migration into the Horn of Africa by Semitic-speakers that only left loanwords in some Lowland East Cushitic languages in the Somali peninsula. But I also rest on the evidence of Semitic originating in the Horn of Africa, so from findings starting in the 70’s with the work of Hetzron that actually makes up for the gaps if you place “Proto-Semitic” in the Levant as opposed to the Horn of Africa.

        Liked by 1 person

