Continuing on the subject of South American languages, possible long-range relationships between them, and whether such relationships can enlighten us about the viability of the Amerind hypothesis, I would like to propose a tentative ‘macro’-classification. This will be quite different both from that of the splitters (who don’t propose macro-families at all) and from that of the lumpers (by which I mean Greenberg’s classification, so often repeated in the scientific literature).
The readers possibly saw the map on the right in a previous post. Greenberg’s classification of the South American languages divided them into broad phyla such as “Andean-Chibchan-Paezan” and “Ge-Pano-Carib”. Ideally, such divisions should be based on shared innovations and vocabulary, but that is not always the case (in fact, it seems that Greenberg’s classifications were ready in his mind before he started looking at the data). Now, I believe a much more accurate, “data-driven” scheme can be devised if the following criteria are taken into account: 1. similarity in morphology; 2. personal pronouns; 3. shared retentions or innovations in basic vocabulary. Let us examine each on its turn.
Prefixing vs Suffixing
First of all, the similarities in morphology. I have written a whole post about the differences between “Andean” and “Amazonian” languages, stressing, among other things, how the first tend to be suffixing whereas the second typically employ more prefixes. I will not repeat the evidence here, but surely one could argue that it only bespeaks of areal features. However, the argument could also be the other way round, i.e. that grammar is hardly borrowed. Otherwise, how to explain anomalies such as Ket and its complex verb chain in the middle of a sea of typically Eurasian languages (Indo-European, Uralic, Tungusic) with which it has probably been in contact for a long time? A language’s morphology is not immutable (remember the example of Egyptian?), but it might tell a lot about it’s historical relationships. After all, is it not on the very origin of the recognition of the Indo-European or Afro-Asiatic families?
N : M vs I : A
Shared personal pronouns are usually considered strong evidence that languages are related. At the same time, they can deceive, since all languages tend to use unmarked phonemes for pronouns (e.g. n, m, k, t…) and, therefore, similarities may arise by coincidence. Nevertheless, as I have been repeating for a while, the question then is: shouldn’t the distribution be random? That is, why would some languages choose a set of unmarked consonants (m : t in Eurasia) while others choose a distinct set (n : m in the Americas)? That is when genetic explanations are more likely. The mention of the controversial n : m series was not fortuitous, as it is quite productive in the classification of South American languages: most families exhibit this ‘pan-American’ pair, but not all! Consider the following example:
A quick look at the evidence above, even without prior knowledge of the geographical distribution of the languages, would lead to their classification in two, or maybe three, groups. Most of them use the pan-American n : m series for the 1st and 2nd persons. However, the Andean languages – Quechua, Aymara, Mapudungun – add a suffix to the 1st person (*-qa, *-ja, -ʧe) and a prefix to the 2nd person (*qa-, *xu-, ej-). This, of course, occurs only in the independent form of the pronouns; in Mapudungun, the 1st and 2nd person verbal suffixes are -n and -mi. The other languages highlighted in blue are Amazonian (or intermediates between highlands and lowlands, in the case of Chibcha). They use the n : m series in a monosyllabic or prefixed form. Pano and Tukano have particularly similar forms. In contrast, other lowland languages appear to have replaced the pan-American pronouns by a vowel pair. This pair, which is i : a in the case of Macro-Jê and Guaykuru, also appears in the Maya languages.
Does that pattern have any implications for classification or is it just a coincidence? As I have mentioned before, the Macro-Jê and Tupi languages have long thought to form a macro-family, together with Karib. It could be suggested that the Guaykuru languages are part of the same group, though a bit removed (this is confirmed through shared vocabulary, as I will show below). Look at a map and you will see that these languages occupy a huge part of eastern South America. They appear to have originated in SW Amazonia. The other Amazonian languages, the ones that preserved the n : m pair, have a more restricted distribution in NW Amazonia (except for Arawak). Thus, we can start to envisage a distinction between the Andean/Patagonian phylum and two separate Amazonian phyla, one of which is characterised by widespread geographical distribution and the i : a innovation in the personal pronouns.
*LAQ’w vs *NENE
Shared vocabulary is the last evidence, and the most prone to borrowing. That is why I insist it must be basic vocabulary with close semantic correspondence. This is a major problem with some of the etymologies in the AED, as in the case of the famous *t’ina ~ *t’ana ~ *t’una, where any relative (or, in fact, human being!) can be compared to any other. This problem exists even in works like the Altaic etymological dictionary of Starostin, and is one the reasons why I think the Dené-Yeniseian hypothesis is plausible (items of basic vocabulary, with close meanings and regular correspondences connect the two families).
Shared innovations in basic vocabulary can help distinguish subgroups. Let us take as an example the Romance languages. Even though they are all derived from vulgar Latin, we can show that some are closer relatives than others, and that is reflected in vocabulary: for “leg”, Portuguese and Spanish have perna and pierna respectively, whereas French and Italian have jambe and gamba. For “head”, we have cabeça and cabeza against tête and testa. For “morning”, manhã and mañana against matin and mattina, and so on. Needless to say, Portuguese and Spanish are closer to each other than each is to either French or Italian.
After scrutinising wordlists of South American languages (admittedly not the best way to do historical linguistics), I believe I have identified six very stable words that are particularly useful in distinguishing subgroups. These are six words that include mostly body parts, but also a simple noun and a verb: ashes, foot, to sleep, tongue, tooth, two.
The table above shows these six words in thirteen languages (or, rather, proto-languages in most cases). I have highlighted, in red and blue, contrasting pairs of words that could be derived from common roots, given their phonological similarity. The gaps in some of the languages are either because the words are not reconstructible, or because I do not have a complete wordlist at my disposal. Some of the “cognates” are only tentative. The possible relationship of PQ *qaʎu and Kaw. qala- to forms such as Kun. lassi has been explained previously. PQ also has *ʎaqwa- ‘to lick’, so metathesis cannot be ruled out. The PTp word for ‘tongue’, despite the resemblance, cannot be related to that set, as I will show in the next post. What is clear from the comparison above is that some languages appear all in red, others all in blue, and some fall in between. This is exemplified by the title of this section, where I make reference to two words for tongue (or ‘to lick’) found in the AED, the first of which appears in the Andes, the second in the lowlands. On the one extreme, in red, we have the core Andean/Patagonian languages. On the other, in blue, we have the core Amazonian languages – Tupi, Macro-Jê and Karib – confirming the evidence from personal pronouns. Some Amazonian languages fall in between: Arawak and Pano. They are also the ones that preserved the pan-American n : m pronouns, also present in the Andean languages. I don’t know enough about the Tukano languages to risk a classification with such a limited number of words as in the table above, but other vocabulary evidence (plus the pronouns) would place it not far from Pano and Arawak. The usefulness of such approach, against mere geographical conveniences, is that some surprises may emerge: Allentiac, a Huarpean language of NW Argentina, is completely Amazonian in its basic vocabulary.
I end this post with a map that reflects my current view of the macro-relationships between the South American families. The primordial division, about which I’ve written before, is that between “Andean” and “Amazonian” languages. To put it in more neutral terms (since there are “Amazonian”-type languages well outside of that basin), we can call them “Highland” and “Lowland” or, even better, “Western” and “Eastern” South American languages – the option adopted in the map. There is a core group of “Andean-Patagonian” languages that includes Quechua, Aymara, Mapudungun, Kaweskar and Chon. Possibly Chimu (Mochica) falls in that group, but I do not have a large enough material at my disposal to ascertain that. Chibcha is probably part of the Western division, though it displays some Amazonian features. I cannot specify the position of isolates like Kunza or small families like Puelche within the group, but they are definitely Western.
Among the Eastern languages, the situation is complex. There is a core group of three families – Macro-Jê, Tupi and Karib – whose deep relationship is given as certain. I like to call them the “Neo-Amazonian” languages, given their relatively recent spread from an Amazonian homeland over enormous areas of the South American continent. Pronominal and vocabulary evidence show that Guaykuru (and almost certainly Mataco, though I do not have the data with me) is not far removed from that group. In the map, I show all of them under the label of “Chaco-Amazonian”. They contrast with languages that did not spread as much, remaining since early times in their original homelands. On the one hand, we have the Amazonian families Pano and Tukano, whose deep relationship I take as a serious hypothesis. We can call them “Palaeo-Amazonian” families. On the other, we have those that seem to occupy a similar historical position in the Chaco (or its border), such as Nambikwara and Zamuco. I propose to call them “Palaeo-Chacoan”. Many small families and isolates (Huarpe, Witoto etc.) are Eastern languages, but their relationship with the others is less certain. Finally, Arawak occupies a strange position. I used to think it had to be related to the ‘core’ Amazonian languages, but after long staring at the pronominal, vocabulary and morphological evidence, I am sure it is far removed even within the Eastern group. Its likely western Amazonian origins, close to the Palaeo-Amazonian languages, would confirm that it is indeed an ancient split.