Greenberg’s Amerind: the baby and the bathwater

Any map depicting a phenomenon at a global scale is doomed to lack detail. Some subtleties that might be important at a finer spatial scale simply cannot be represented, or are not so relevant when we want to see the ‘big picture’. Unfortunately, that tends to be the case for world language maps: the Americas are very frequently represented with the classification proposed by Joseph Greenberg – either in the three large groups, Eskimo, Na-Dené and Amerind, or breaking them down into the lower-level macro-families (such as Equatorial-Tucanoan, Gê-Pano-Carib etc.). Now, it is easy to understand why this is done: just look at a linguistic map of Eurasia using a non-controversial classification (e.g. Turkic and Mongolic instead of Altaic) and you will see that a couple of families (a bit more than ten) cover the whole continent. If one is to do the same with the Americas, nearly 100 families are necessary. Put this side by side with the Old World and there will be a huge disparity in the map. We can create the illusion of balance between Old and New World by lumping the American families into larger groups, and that’s where Greenberg’s classification comes in handy. Even the venerable Cabridge Encyclopedia has adhered to that scheme in its maps, as have geneticists, archaeologists, and others when looking for a ‘minimalist’ classification of the American languages.

Amerind languages according to the classification of Greenberg and Ruhlen’s etymological dictionary.

Needless to say, Greenberg’s scheme is not taken seriously by the majority of linguists working in the Americas. Greenberg built his reputation by proposing a classification of only a few families for the African languages, diminishing the previous myriad of families. However, even his African ‘families’ were based on shared features like classifiers and their genetic validity has been questioned (why couldn’t they result from areal diffusion, for instance?). The problems with the Amerind classification are much worse, as well summarised by Lyle Campbell in a number of papers. Here, I will only mention that the etymological dictionary is full of errors, weird segmentation and words from dubious sources (wouldn’t you expect so in a survey of hundreds of languages carried out by one man?) – but it must also be pointed out that Greenberg’s notebooks make clear that he had already devised his classification before looking at the evidence, which was then used just to fill up his preconceived scheme. Let’s look at the longest etymology, one that has been presented as the definitive proof of Amerind: *t’ina ~ *t’ana ~ *t’una “son / child / daughter”.


Reflexes of this purported root are found everywhere in the Americas. The ablaut i / a / u supposedly indicates male, neutral and female genders, and the fact that words with “i” tend to refer to male relatives and those in “u” to female ones was used as evidence of this system having been transmitted intact during the colonisation of the Americas. I like this etymology and I actually find it somewhat convincing. Unfortunately, there are many problems – as you can see in the few examples above, it seems that in terms of meaning anything goes, from son, to brother, to wife. This is typical of mass comparison. Combine this with the fact that anything like T-N, TS-N, Z-N etc. counts, and it will be relatively easy to find cognates.

The problems with Greenberg’s method should not discourage us from seeking deep relationships between the major American families. I believe many of them will ultimately be proven to be genetically related. One of the main criticisms to long-range comparisons is that very few, if any, cognates would have survived after so many millennia. Also, they would not be recognisable as such due to heavy semantic and phonetic shifts (English two and Armenian erku are usually cited). But is it so? What about French dent and Hindi daant? nom and naam? mort and mrta? Perhaps these languages have extremely conservative phonologies… but what about Greek δόντι [though it’s pronounced /ðodi/] and Russian мёртвый /mertvij/? All these words are separated by 7,000 km and at least six millennia, yet they are still recognisable as cognates and, in some cases, have almost identical phonology. My favourite example, however, is Afro-Asiatic: by the time the oldest languages in this family were recorded (Egyptian and Akkadian), over 4,000 years ago, they were already very different languages, so the proto-language must have been spoken several millennia before that, maybe 10,000 years before present or even more. Yet, we could still reconstruct proto-Afro-Asiatic based on languages spoken today (Arabic, Somali, Hausa…). So, why wouldn’t it be the same with Amerind?


Let me give one example of deep relationships that are becoming quite obvious. Language families that Greenberg and Ruhlen subsumed under the ‘Equatorial-Tucanoan’ and ‘Gê-Pano-Carib’ groups have long been thought to be related. There are a number of potential cognates between Macro-Jê and Tupi that seem quite convincing, some of which are shown below. It is extremely important to compare the forms in the oldest possible reconstruction, not between the individual modern languages of each family, otherwise false cognates may be identified or true cognates would not seem so close (as can be seen in the examples below). This is why many entries in the Amerind Etymological Dictionary are so embarassing to look at: for example, the words for ‘eye’ in three different branches of Macro-Jê appear scattered over three different etymologies (supposedly deriving from Proto-Amerind roots 248 *kad, 250 *ere, 252 *hin) when, in reality, we know that they all go back to the same proto-Macro-Jê  root!

Some potential cognates between Proto-Macro-Jê (PMJ) and Proto-Tupi (PT). PJ = Proto-Jê. PMG = Proto-Mawetí-Guarani.

Even more intriguing than the shared vocabulary are some shared grammatical subtleties. For example, many Macro-Jê languages have a relativiser prefix: e.g. Suyá kʌtirεyε y-aykwa ‘the boy’s mouth’ vs. s-aykwa ‘his mouth’. Exactly the same happens in the Tupi-Guarani languages, e.g. Tupinambá aβa r-oβa ‘the men’s face’ vs. s-oβa ‘his face’. These shared ‘irregularities’ are some of the most promising evidences of deep genetical relationships.

Furthermore, there are similarities in the pronoun system. Most Macro-Jê languages use a variation of ĩ- / a- / i- (1st, 2nd and 3rd person) as prefixes. Although modern Tupi-Guarani languages use a different set of prefixes (a- / εrε- / o-), when we go back to the proto-languages, we see some reconstructed pronoun prefixes that look a lot like Macro-Jê (such as PMG *uj- / *e- / *i-, or Karitiana iand a-). If we wanted to go really long-range in the comparisons, we could mention that the ĩ– / a- / i- series appears even in the Maya languages.

Maya possessive pronouns ni- / a- / y- (u-). It is rare to find the first and second pronouns in the glyph inscriptions. Modern Maya languages preserve this triad: Ch’ol has k- (x-) / a- / i- and Yucatec has in- / a- / u-.

In summary, before changing topics, there is good evidence for a deep genetic relationship between Macro-Jê and Tupi, to which we must add the Karib family. The closeness of these three families has been advocated by the Brazilian linguist Aryon Rodrigues and by others working in the field. This means that Greenberg and Ruhlen’s ‘Gê-Pano-Karib’ should rather be something like ‘Jê-Tupi-Karib’. They erroneously consider Tupi as part of an ‘Equatorial’ group that includes Pano and Arawak (these families seem rather distant from the other three). I would say that the core idea of Amerind is right, i.e. that many of the large families of the Americas will be proven to be genetically related, but that it was substantiated by the wrong evidence. As the Jê-Tupi-Karib hypothesis has shown, work on the level of the proto-languages can render more reliable sets of cognates.

More on pronouns: N / M vs. M / T

Since I mentioned personal pronouns, it must be said that these were long thought to prove the genetic unity of the Amerind languages. Sapir, and later Greenberg and Nichols, already noticed that many Amerind languages used some form of n- for the 1st person and m- for the 2nd person. This would be present, for example (from North to South), in Sahaptin inim, Yokuts na: / ma:, Nahuatl no- / mo-, Quechua noqa / qam and Mapudungun iñche eymi. Interestingly, the n : m pattern is particularly frequent in the western halves of both North and South America, coinciding with other areal features. This is an interesting distribution, and it even appears as one of the thematic maps of WALS. We can contrast this with the pair m : t that is prevalent in Eurasia, as I mentioned in a previous post. There has been criticism of the pronoun mass comparison, especially because consonants like n, m or t are unmarked (easy to pronounce) and tend to appear as grammatical particles in virtually every language – so it would be easy to arrive at the n : m pattern by coincidence. Still, I don’t think that explains why, by sheer coincidence, American languages would have chosen this specific pair, whereas Eurasian ones would have preferred m : t.

Agents and Patients across the Americas

I would say that the n : m evidence is compelling, but even more interesting is the shared morphology of the American languages. In fact, it has been proposed that language structure changes more slowly than vocabulary and may provide better classification when the last fails. Attempts of classification using structural elements have even been made at a global scale. Of course, there is much diversity in the Americas, but let’s take as an example the languages in the eastern half of the continent – those we can describe as 1) more prefixing than suffixing and 2) with an ergative alignment when it comes to marking pronouns in the verbs:


The examples above encompass a few of the major language families of the Americas. Maya is not exactly large, but it is historically important, and Maya Glyphs are always nice to look at. However, it is very difficult to find glyphic examples with verbs in the 1st and 2nd persons. The sentence a-winak-e:n (“I am your man”) appears in a panel in Piedras Negras, pronounced by a member of the court to his sovereign. The sentences from a modern Maya language, Ch’ol, provide a fuller range of examples.

The intention of the comparison above is to show how personal pronouns are marked as 1) possessives (my wife”); 2) subjects of intransitive verbs (I came”); 3) agents or subjects of transitive verbs (I saw it”); 4) patients or objects of transitive verbs (“you hit me). As you can see above, the adjectives in many Amerind languages really function as intransitive verbs (“to be happy”, “to be hot”…). Even more interestingly, noun predicates (“I am your man”) can be constructed in the same way as intransitive verbs. In general, Amerind languages that follow the pattern above use two sets of pronoun affixes for marking possessives, subjects, agents and patients. Most often, the possessive, subject and patient are marked with the same prefixes, whereas the agent is marked differently (e.g. Lakhota, Tupinambá). In other cases, like Cree and Ashaninka, it is only the patient that is marked differently. Curiously, in these languages, as in Ch’ol (and Classic Maya), the patient is suffixed (-in,on,na “me”). This pattern is extremely important: as we will see, it is present in “islands” across Eurasia. I encourage you to explore the global distribution of such features with the World Atlas of Language Structures.

5 thoughts on “Greenberg’s Amerind: the baby and the bathwater”

  1. It’s a shame I discovered your series on Amerind from a SA-oriented perspective only today! It goes without saying that much bottom-up work remains to be done until we are in position to critically assess those long-range etymologies, but I believe you are on the right track; after all, such work is being done.

    I must say I felt somewhat confused upon seeing my own outdated PMJ reconstructions. For example, it seems clear now that Krenák maintains the manner of articulation of PMJ codas, so the correct reconstruction in my current framework is *-ɔɲ ‘tooth’ — note that it might explain the nasalization in Tupían. I also failed to understand the reason that led you to modify your reconstruction of PMJ ‘tongue’: Central Jê is quite unambigous about word-medial *-jt- (Southern Jê and Maxakalí seem to offer indirect evidence, while Karajá points to earlier *jɔrcɔ / *cɔrcɔ, which might be an instance of methatesis). There are some mistakes (typos?) in several synchronic forms you quote as well; the actual Maxakalí word for ‘name’ is tʃɨ=tʃet-ʔaj / ʔã=tʃet-ʔaj, a nominalization of the verb tʃɨ=tʃet / ʔã=tʃet (tʃɨ= and ʔã= somewhat ironically seem to be true relational prefixes, I already submitted an article which includes a discussion of these morphemes).

    As for Tupían, a critical reevaluation of Rodrigues’s reconstructions is necessary (Meira & Drude 2015 is a first step, but I don’t always concur with them either). In the recent months, I am increasingly convinced that Rodrigues-style *tʼ, *tʃ/*ts, *kʸ and *kʼ should in fact be reconstructed as *j, *t, *k and *q respectively, whereas the correspondence TG *t ~ Sateré-Mawé *h ~ Mundurukú *t ~ … (the consonant that appears particularly often in word inflected for 3SG, also known as non-contiguity (R²) relational prefix) must go back to something like PT *c or *ts. I might not have time to publish that in the next few months, but if you are interested, I would be more than happy to share my thoughts on PT consonantism. All in all, that way Andean **q might in fact correspond to *q in Proto-Tupían, if your etymologies are correct.

    I am currently working on a hypothesis that would include MJ (Jê, Maxakalí, Krenák, Karajá, Ofayé, Rikbáktsa and Jabutí are core members, Bésɨro would be an outlier), Tupían, Boróro, Zamucoan, Mataco, Guaicurú and possibly Karíb and Karirí languages; not so sure about Arawák. The next families I am intending to look at are Kamakã, Pano-Tacanan, Yanomámi and Purí (the latter two, as well as Karirí, share the muta cum liquida situation observed in MJ — namely, KR and PR occurs, while TR and CR do not). The first version of a MJ etymological dictionary is also under construction; let me know if you are interested in my updated reconstructions!

    Kind regards,


    1. Hi André,
      Thanks for your thorough reading of my materials on South America. I haven’t posted here in a while as I have been involved with other writing activities that take most of my time, and I will probably not go back to South American themes so soon. There are certainly mistakes, not typos, in the post, as I’m not a specialist in those languages. I will gladly add your corrections as comments to the original post.
      I believe Ribeiro (2002, 2011) has convincingly demonstrated that Kariri should be included in the Macro-Je stock. Apart from that, I agree (as you must have noticed) that all the families you listed are related at a deeper level, while Arawak is somewhat of an outlier in this lowland context. I have been thinking that this is due both to the geographic origins of Arawak as well as the manner of its spread, things that I have mentioned in some posts every now and then.
      I will be very interested to check the Macro-Je etymological dictionary when it’s ready!
      Best regards,


      1. Dear EmeKur,

        Eduardo Ribeiro has, indeed, pointed out some important similarities between Karirí and Macro-Jê, but none of them seems to be exclusive to these families. He provides a part of the paradigm of the stem meaning ‘fire’ in Kipeá and Paraná, and the similarities might seem striking; however, closer scrunity reveals that the PMJ stem must have been, after all, disyllabic (*kucɯmᵊ in my current reconstruction; the similarity of the Ofayé and especially Jabutí roots must be fortuitous; Maxakalí, Karajá, Central Jê and Northern Jê all retain the initial syllable). PNJ unstressed *ku- regularly yields Pnr *i-, not necessarily in prefixes, and it is easy to see how Pnr isɨ might have been truncated to sɨ after the indirect possession morpheme. Lexical evidence is very scarce and includes the following:

        PMJ *krak ~ *krat ‘stone’, Kipeá (and Dzubukuá) kro ‘stone’ (the only exclusive MJ-Karirí isogloss I could be able to identify, though you provide some parallels from the Pacific side);
        PMJ *(-i)jiT ‘name’, Kipeá dze ‘name’ (hardly necessary to comment);
        PMJ *-ɔɲ ‘tooth’, Kipeá dza ‘tooth’ (present also in Tupí, Boróro, Guaicurú…);
        PMJ *pVrV ‘foot’, Kipeá bɨ(ri-) ‘foot’ (present also in Boróro);
        Kipeá bo(ro-) ‘arm’ might be related to the PMJ root for ‘arm’ or ‘hand’.

        I will briefly comment on some PMJ forms you use, with or without consequences to your assumptions.

        1. PMJ *nʌm ‘eye’ might be a phantom. It is uncertain if Karajá and Krenák forms are related to PJ *nʌm, as well as if *n was actually present in PMJ.

        2. PMJ *ʃ-ɔj ‘leaf’ is now reconstructed as *j-ocᵊ (3 *c-ocᵊ); note the voiceless reflex in Krenák.

        3. PMJ *ʃ-ɔj ‘tooth’ is now reconstructed as *j-ɔɲ (3 *c-ɔɲ); note the nasal reflex in Krenák
        3a. PJ *j-ua should be rewritten as *j-ɔ (3 *c-ɔ).

        4. PMJ *ŋot ‘louse’ is somewhat problematic. It is based on the comparison of PJ *ŋo with Maxakalí kɨt, which is technically impeccable. However, as I discovered no earlier than today, there is Krenák ŋəm ‘louse’, which is not comparable with Maxakalí because of the final consonant but nevetherless IS comparable with PJ *ŋo. This alternative comparison has one major advantage (a marvellous straightforward match in Tupían) and a couple of disadvantages: (1) PMJ *ŋom would be expected to yield Central Jê *kõmõ (utterance-medially) / *ku (utterance-finally), but the former form is not attested; this could easily be a gap in the existing descriptions, most sources do not attest the form *kõmõ for the homonymous ‘tree / horn’; (2) as far as I remember, in Southern Jê an alveolar consonant is found in the derived transitive verb (PSJ *gʌ –> *ga-n, Kgg ŋga –> ŋã-n). I am inclined to think that the Krenák form is unrelated and the reconstruction remains PMJ *ŋot ~ *ŋon.

        5. PMJ *koj ‘river’ is a difficult one. The relevant data are:
        PNJ *ŋo(c) ‘water’, pointing to PJ *ŋoj ~ *ŋoc
        PCJ *kɨj ‘flowing water’, pointing to PJ *ŋɨj ~ *ŋɨc (or, less probably, *ŋɤj ~ *ŋɤc)
        PSJ *ŋoj# ‘water, river’, pointing to PJ *ŋuj
        PCJ *kuj ‘still water’ can correspond regularly to PSJ *ŋoj# (PJ *ŋuj) or to PNJ *ŋo(c) (PJ *ŋoj ~ *ŋoc), but not to both.
        Outside Jê, Maxakalí koj ‘river’ can correspond to something like PJ **ŋɔj ~ **ŋʌj ~ **ŋɔc ~ *ŋʌc.

        We can see that at least two distinct roots are at play (both contrast synchronically in Xavánte, not sure about Xerénte). It seems highly unlikely that there are FOUR distinct roots, but with our current knowledge of MJ historical phonology they are difficult to reconcile. In theory, the possibility of borrowing between Jê branches and Maxakalí cannot be discarded, but it seems more probable that we still don’t know something very basic about PMJ vocalism.

        6. PMJ *ʃ-(ij)it ‘name’ is now reconstructed as *…-jit ~ *…-jɨt; Karajá nĩ (< *jĩ) does not necessarily belong here.

        7. PMJ *ʃ-ɔ̃(ɲ)tʌj ‘tongue’ is now reconstructed as *ɲ-ɔ̃jtʌk (3 *c-ɔ̃jtʌk).

        8. PMJ *ʃ-VmV ‘blood’ is now reconstructed as *j-ɔpᵊ (3 *c-ɔpᵊ).

        9. PMJ *par ‘foot’ is better understood as an original disyllabic stem (*pVrV; eastern languages point to *para, whereas western languages disagree and point rather to *pʌrʌ, a discrepancy also found in the word for 'liver').

        10. PJ *ʃ-ɔ̃t ‘to sleep’ is now reconstructed as *ɔ̃rᵊ (nominalized: *ɲ-ɔ̃r, 3 *c-ɔ̃r). The alleged cognate in Jabutí is surprisingly similar to the Boróro form (underlying /nutu/) and is not easily reconcilable with other MJ roots.

        11. PMJ *krãj ‘head’ is now reconstructed as *kɾʌ̃ɲᵊ.

        12. Kgg paro is actually pɔ̃ɾɔ (orth. pãró) and means 'rock'; the compound *ɬe=parɔ 'thorax' is reconstructible for PSJ (Laklãnõ ze palu).

        13. "The Macro-Jê set is correct and allows the reconstruction of PMJ *pok ‘to burn’."
        In fact, Botocudo pek (if it is not a hapax) should also be excluded. Krenák historical phonology is mysterious in many aspects, but one thing that doesn't seem to happen there is fronting of back vowels (or backing of front vowels, for that matter). PJ *o corresponds to ə in Krenák, and I am aware of no source on Krenák that would transcribe /ə/ with "e".

        14. "*tε̃C and *mɔ̃ŋ. The first seems to be in the origin of the Karajá word included in the entry (which is really a construction with the root r-a- ‘to go’ in the future tense, with the typical suffix -kre)."
        I am afraid this was a bad solution of mine. As for now, I am not even sure the correspondence PJ *t ~ Karajá r (proposed by Ribeiro and, I think, by Davis) even exists.

        I could also try to comment on the Tupían reconstructions you use, but hardly today.

        One language family that seems to be missing in your texts at all is Cahuapanan. Rojas-Berscia's stance (which I guess you are familiar with) is that it is a characteristic representative of a pan-Andean pronoun pattern; its pronoun set is VERY similar to that of Puelche, and there is some scarce lexical evidence that would unite it with Puelche, Quechua and Aymara. My favorite is the word for 'tongue', almost identical to its counterparts in Arawak and Karirí.

        As for the Macro-Jê Etymological Dictionary, it is being compiled in Google Docs, so there is no need to wait until it is ready (knowing me, it could easily take 50 years; currently I am able to compile 1 cognate set per day). I might be uploading the first draft to soon, but otherwise feel free to ask me for a link.

        Best regards,


