Amerind from a South American perspective: Part I

Expanding on the previous topics, I would like to dedicate a series of (possibly three) posts on the problem of classification of South American languages into broader groups. This is partly due to my own recent efforts of comparing well-established proto-languages (and a few isolates) in that continent, but it will also illustrate some of the fundamental challenges of the Amerind hypothesis as a whole.

First of all, a personal note: it has been said that Amerind – pretty much like Nostratic or Dené-Caucasian – is a matter of faith rather than science. Although I do not entirely agree with that, I thought I should clearly state my credo: I believe that all languages of South America are related, and, on a larger scale, that most languages of the Americas are. But what does that mean exactly? Because you cannot disprove a relationship between two languages, the burden of proof is on who proposes the relationship. Unless you believe in polygenesis (which is not my case), all languages ultimately derive from the same ancestor, and saying that two of them are ‘related’ only means that they are more closely related than with any others. Most importantly, if you want to do serious research instead of delving into unsystematic speculation, the proposed relationship must be demonstrable through valid linguistic methods – regular sound correspondences, shared basic vocabulary and grammar etc. Even if all languages derive from a common source, not all can be demonstrably related due to the enormous time-scale that obliterated the most distant relationships (‘proto-world / proto-sapiens’ claims notwithstanding!).

Given those considerations, here is how I should properly state my position: 1. all South American language families are more closely related to each other than with, let’s say, North American ones; and 2. these relationships are demonstrable because 3. the time elapsed since their divergence was not long enough to obliterate them.

The first point is the most important but, unfortunately, the weakest of them until a better understanding of the peopling of South America is achieved. I have previously mentioned the archaeological confusion that reigns in the continent during the Late Pleistocene-Early Holocene, the evidences of 14-18,000 year old occupations, and the genetic clues of an early, non-Mongoloid migration. Whether or not such early migrations were dead ends (leaving no linguistic imprint in South America), the fact is that there does not appear to be any extraneous influence in the continent after 12,000 years ago, and at least macro-families dating to that time should be recognisable.

Etymologies versus lookalikes

In this first post, I will only illustrate the importance of using valid methods in long-range comparison by examining some entries in the Amerind Etymological Dictionary (AED). I will use as examples the Tupi and Macro-Jê families with which I have some familiarity. There are 253 Macro-Jê and 114 Tupi etymologies in the dictionary. I will focus on those entries of (relatively) basic vocabulary with reflexes in both families. Because there are also a lot of those, I have restricted this post to the first ten entries with those characteristics. Fortunately, they are very representative of the whole: very few of them are sound, others help to illustrate the problems that permeate the AED, namely: 1) arbitrary segmentation; 2) inclusion of words that are not widespread within a family (and whose antiquity is thus questionable); and 3) splitting a well-established cognate set of a language family into multiple ‘Amerind’ etymologies.

Before we start, one thing must be very clear. The AED is a bona fide work that demanded much time and energy from its authors, bearing witness to an erudition and panoptic perspective that I do not claim to possess. Some of the entries may one day be proven to be valid etymologies. Others may reveal interesting long-range loanwords and thus shed light on the prehistory of the continent. Overall, however, the AED is similar to other ‘etymological’ dictionaries whose value is questioned by specialists, such as the Altaic or North Caucasian dictionaries of Starostin and others. The mistakes found there are as embarrassing as the ones in the AED, as reviews have pointed out. The authors of such dictionaries are undoubtedly competent, intelligent linguists whose work demanded much research and time. However, unlike their claims, the fact that such works have been written does not prove the validity of the proposed macro-families. It only proves one thing: that, with enough research and time, long etymological dictionaries can be written connecting any two language families of the world.

Now, for the etymologies.

[Note: PTG = Proto-Tupi-Guarani; PJ = Proto-Jê; PT = Proto-Tupi; PMJ = Proto-Macro-Jê. The spellings in the excerpts are exactly as given in the AED. When I cite specific proto-words, I use the reconstructions of Correa da Silva 2010 (PT) and Nikulin 2015 (PMJ). They were eventually modified when I had some divergence.]

3. ABOVE3 Equatorial: Tupi: Chiripa rakã ‘head’. Macro-Ge: Caraja: Javaje rahah ‘head’. COMMENTS: Chiripa is a language belonging to the southernmost division of Tupi-Guarani, only one of the branches of the Tupi family, but is very representative of that branch. The PTG form was *ʔa-kaŋ; the form cited in the AED includes a “detachable r-” about which I wrote briefly in a previous post. The PTG word, on its turn, derives from a combination of two PT roots: *ʔa ‘head’ and *kãŋ ‘bone’. As for Macro-Jê, the Karajá word ra (the longer Javaé form must have been chosen in order to better resemble the Tupi one) has a well-known etymology within the family, being cognate to e.g. PJ *krã and Krenak krεn, all of which go back to PMJ *krãj. PT *ʔa ‘head’ and PMJ *krãj ‘head’ do not appear to be cognates, even if some of the daughter languages eventually developed forms that look alike by pure chance. As is not unusual in the AED, some reflexes of PMJ *krãj ‘head’ appear in a different etymology altogether (4. ABOVE4).

25. ARM1 Equatorial: Proto-Tupi *po ‘hand’, Tupi po, Guarani po, Guayaqui i-pa, Kaapor n-po, Cocama puwa, Kepkeriwat baMacro-Ge: Chiquito i-pa, Erikbaktsa -čipa, Proto-Ge *pa, Guato (ma-)po, Kaingang: Apucarana pe, Tibagi pen, Opaie (či-)pe. COMMENTS: This is, in essence, a valid etymology. PT has actually two possible reconstructions, *po and *mo. The entry in the AED conflates two distinct PMJ roots, *mo ‘hand’ and *paC ‘arm’, only the first of which is cognate with the PT root. It also has clear Panoan cognates that have been ignored in the dictionary’s entry (they have been included in a different etymology, 26. ARM2). The second has some obvious Andean cognates that have also been missed. I will elaborate on this cognate set in the third post of this series, so I will restrain from further comments now, except for noting that Guató (even if this particular word is probably cognate) is an isolate as there is no compelling evidence for its classification as Macro-Jê.

 57. BELLY2 Equatorial: Tupi: Shipaya parua ‘belly’, Arikem pera ‘navel’, Uainuma punua ‘navel’, purua ‘navel’, etc. Macro-Ge: Bororo: Umotina upuru ‘thorax’, Fulnio epatio ‘upper abdomen’, Ge: Apinage pitãn ‘body’, Crengez patu ‘belly, chest’, Kaingang: Serra do Chagu (idfe-)paro ‘chest’, Puri: Coroado puara ‘chest’. COMMENTS: the Tupi part does not seem to be intrinsically wrong, though I have not seen a reconstruction for such cognate set in the recent literature. As for Macro-Jê, there are a few problems. The words cited for the Northern Jê languages are definitely not cognates, since Krenyê goes back to PJ *tu(m) ‘belly’ with the 1st person plural possessive pa-. Funny enough, the same word, without the prefix, appears in a different etymology (59. BELLY4). For the Kaingang word, the AED presents a variant in a pretty obscure dialect and exposes a methodological flaw that, unfortunately, is typical of the work as a whole: the word meaning ‘chest’ is exactly the one that was put in parenthesis as if it was a sort of unnecessary prefix! idfe- is really ĩɲ ɸe ‘my chest’ (I could not find a meaning for paro). Kaingang ɸ traces back to PJ *s-, so this word is not cognate with the other Jê words. This leaves us with the Umutina and Coroado forms (I doubt Fulniô epatio is related). If these two go back to some PMJ root, and the same is true for the Tupi words in this entry, then we might have a valid cognate – the many mistakes in the AED notwithstanding.

73. BLACK4 Equatorial: Tupi: Manitsawa diadiaMacro-Ge: Caraja uitira ‘green, blue’, Fulnio čičia ‘black’, Ge: Krenje teted ‘green’, Crengez ntetete ‘green’, Kaingang: Dalbergia čɨ ‘dark brown’, Kamakan: Kamakan hittu ‘green’, Cotoxo itiɬ ‘green’. COMMENTS: Manitsawa is a language of the Juruna branch, which also includes e.g. Xipaya tinikĩ ‘black’. Other Tupi languages like Wayoro have forms such as tiktik ‘black’ that appears to be a better fit for this etymology. In any case, none of these words can be traced back to PT as they are not widespread in the family, being restricted to particular branches. As for Macro-Jê, Krenyê is strangely cited twice (with two different spellings for the language name). It is not possible to reconstruct PJ ‘green’, but the Kaingang word that belongs here is ku-tɨ ‘dark’, from PJ *tɨk ‘black’ (ku- is a very productive prefix in the Jê languages). The other Macro-Jê words cited might indeed be cognates going back to PMJ. However, because a parallel word cannot be reconstructed for PT, the comparison between the families is not convincing.

94. BREAST1 Equatorial: Tupi: Proto-Tupi *kamMacro-Ge: Botocudo kuã ‘inside’, kuaŋ ‘belly’, Ge: Cayapo kamaŋ ‘inside’, Krenje kamã, Kaingang: Tibagi ka ‘inside’, kan ‘inside’, Palmas kamme ‘inside’, Mashakali: Mashakali, Capoxo it-kematan ‘inside’, Macuni i-kematahi ‘inside’, Patasho e-kæp ‘inside’. COMMENTS: this is a valid etymology with a couple of mistakes. The PT form should rather be *ŋãm. As for Macro-Jê, the probable match is PMJ *kɤp (~ -ε-) ‘breast’ that can be reconstructed based on PJ *kʌ and Maxakali kεp. As with ‘hand’, I will write more about this in the future, so I will restrain from further comments now.

102. BURN3 Equatorial: Proto-Tupi-Guarani *apɨMacro-Ge: Botocudo pek, Karaho puk, Erikbaktsa okpog(-maha), Yabuti: Arikapu pikö ‘fire’, Mashubi piku ‘fire’. COMMENTS: a rare case where the AED has a plausible etymology with almost no mistakes. We can go further back in time from PTG, as this root can be traced all the way to PT *pɨk’ ‘to burn’. The Macro-Jê set is correct and allows the reconstruction of PMJ *pok ‘to burn’. The only exception is Jabuti: the proto-Jabuti reconstruction should be *pi-ʧə, from PMJ *ʃɯm ‘fire’, a different etymology altogether.

103. BURN3 Equatorial: Tupi: Sanamaica kaːi ‘fire’. Macro-Ge: Proto-Ge *ku-zɨ ‘fire’, Patasho köa ‘fire’, Macuni  ‘fire’, Mashakali ko ‘fire’, Kapasho ka ‘fire’. COMMENTS: this etymology illustrates the fundamental problem of arbitrary segmentation in the AED. Sanamaica is a Mondé dialect, a group of languages where forms such as kãj or kãːj ‘to burn’ appear. They are cognates of PTG *kaj ‘to burn’, and thus we can reconstruct something like PT *kãj. The problem is the Macro-Jê set in this entry: the relevant part of the PJ word is *-zɨ (rather *-sɨ in modern reconstructions), the *ku- prefix being very common in the Jê languages. Although forms with a similar prefix appear in other Macro-Jê languages (e.g. Karajá hε-kɔ-dɨ), others include a different prefix (e.g. Ofaye ĩ-ʃɨw or the Jabuti word cited above). The PMJ reconstruction is *ʃɯm ‘fire’, an unlikely cognate of the PT root *kaj ‘to burn’.

150. COME1 Equatorial: Tupi: Arikem an ‘go’. Macro-Ge: Botocudo , Caraja anakre, Kamakan: Meniens ni (imperative), Mashakali: Mashakali nũn, Patasho nanæCOMMENTS: the only Tupi language cited is Arikem. I could not find a reconstruction for ‘to go’ in PT that could result in this word. As for Macro-Jê, there are two well-known PMJ roots meaning ‘to come, to go, to walk’: *tε̃C and *mɔ̃ŋ. The first seems to be in the origin of the Karajá word included in the entry (which is really a construction with the root r-a- ‘to go’ in the future tense, with the typical suffix -kre). This etymology is thus unconvincing.

157. COOK3 Equatorial: Tupi: Arua kaʔin ‘fire’. Macro-Ge: Ge: Piokobye kaho ‘fire’, Aponegricran koxʔho ‘fire’, Mehin kühü ‘fire’, Taje, Purekamekran kuhü ‘fire’, Karaho, Apinage kukuvu ‘fire’, Ramkokamekran kuxu ‘fire’. COMMENTS: this is one of those etymologies in the AED that is just funny to look at. A few lines above, I was analysing precisely the same cognate set, originating from PT *kãj ‘fire’ on the one hand, and from PMJ *ʃɯm ‘fire’ on the other. In this entry, all the Jê forms cited derive from PJ *ku-sɨ, which includes a prefix, hence the fortuitous resemblance with the Tupi forms. Somehow, for the authors of the AED, the same reconstructed proto-word in a family can derive from two different ‘proto-Amerind’ roots. Or rather not, because they do not care about reconstructions. This situation can be found repeatedly in the AED.

178. DIE1 Equatorial: Proto-Tupi-Guarani *manõ, Oyampi mahẽ ‘dream’. Macro-Ge: Mashakali: Mashakali, Monosho monon ‘sleep’, Macuni moñung ‘sleep’, Capoxo, Kumanasho mono ‘sleep’, Opaie moye ‘die’, Puri: Coropo mamnon ‘sleep’. COMMENTS: the PTG root cannot be traced back to PT, as it is not found in any other branch. Two roots for ‘to die’ can be tentatively reconstructed for PT: *pap and *eʔã. For ‘to sleep’ we can reconstruct *kjet. As for the Macro-Jê part of the etymology, this is one of the most obvious cognates across the family. In the AED entry, the PJ root *j-õt has been left out, maybe because it does not resemble the others very well. Other cognates that were ignored are Karajá õrõ, Rikbaktsa uru and Proto-Jabuti *nũto. The Ofayé word for ‘to die’ is not a cognate (but jõr ‘to sleep’ is). The mo- prefix is not found outside of the easternmost languages such as Maxakali. Thus, the PMJ reconstruction should be *ʃ-ɔ̃t [I will explain the ‘detachable’ ʃ- soon], unrelated to the PTG root presented in the AED or to any of the PT roots that I mentioned above.


Out of the ten etymologies, only two did not present major problems, and this from the point of view of the two families (Tupi and Macro-Jê) analysed. Although Greenberg once argued that the inclusion of many families would create some sort of ‘random error’, I believe it only multiplies the problems if a careful family by family evaluation is not accomplished. Because this procedure was not followed, the AED does not present etymologies – words that can be carefully traced all the way back to a common ancestor – but rather ‘lookalikes’.

