Browsed by
Category: Languages

Royal Rules on Rich Rectors

Royal Rules on Rich Rectors

Last month, the Guildford Shakespeare Company put on a production of Richard II, a fascinating tale of political strife and the perils of having a leader lacking in competence when the country is in crisis. Sound familiar? In any case, this got me thinking about the name Richard and its many etymological links.

First with the name Richard. It’s borrowed from French, but it didn’t start there. In fact it is one of a number of French words that was borrowed from Germanic, deriving from Frankish *Rīkahard, meaning ‘hard/brave king’. This also gives modern German Richard and through the travels of the Goths and Vandals also made its way into Spanish as Ricardo and Italian as Riccardo. The first part of this name, the *rīk- ‘ruler’ part, in other derivations also gives words like German Reich and Dutch rijk, both meaning ‘empire’ or ‘kingdom’, which in English is also found as the ‘domain, kingdom’ suffix -ry, as in Jewry ‘the Kingdom of the Jews’. As different derivation again gives us English rich, something you’d rather expect a king to be. As a component of names it is ubiquitous in Germanic, such as in Old English Godric ‘God(ly) king’, Wulfric ‘Wolf-king’ and Theodric ‘King of the people’. This last one turns up in German as Dietrich and, again courtesy of the Franks, through French Thierry comes into English as Terry (see also my previous post on the Germans for more on this Theod-).

But it is not only Germanic languages that have this root. Indeed, some form of it crops up across the Indo-European language family, usually meaning something like ‘king’ or ‘ruler’. In Celtic (from which Germanic likely borrowed the rīk- words) we find e.g. in Irish and rhi in Welsh, both meaning king. In Gaulish, rulers such as Vercingetorix and Ambiorix had an earlier form –rix it as part of their name, and in a reduced form we find the same in the Welsh surname Tudor, originally meaning ‘ruler of the people’ and thus cognate with Theodric/Dietrich/Terry.

In Latin too we find rēx, again meaning ‘king’ or ‘ruler’. This form survives as such in many modern Romance languages, for example Spanish rey and French roi. We also get two separate adjectives in English: regal from Latin and royal from French. Further afield, we find this word cropping up as far away as India, in the form of Sanskrit rāja, once again a ‘king’ word, as well as rāṣṭrá, a ‘kingdom’.

All of these forms can be traced back to a form in Proto-Indo-European (the reconstructed ancestor of all of these languages), which we represent as *h3rḗǵs. In the terminology of Indo-European studies this is an ‘athematic root noun’, meaning a short root without additional derivational suffixes onto which inflectional endings such as the nominative singular *-s are suffixed directly, rather than having an additional ‘theme vowel’ *-o inbetween. As with many such forms in Proto-Indo-European, when we isolate the root itself, *h3reg-, which probably meant something like ‘stretch out the arm, direct’, we can find even more related derivations.

Adding a thematic vowel *-e/o- we get a verb which shows up in Latin as regō ‘rule, govern, direct’, along with an array of derived nouns which we have inn English. We have the agent noun rector, the instrument noun rule (from a French reflex of Latin rēgula) and the abstract noun regimen. Additionally, we have prefixed verbs such as dīrigō, ērigō and corrigō, which through their respective supine forms dīrēctum, ērēctum and corrēctum give us English ‘direct’, ‘erect’ and ‘correct’ respectively.

Germanic, meanwhile, provides us with a different set of reflexes of this verb. While we have already seen the rich set relating to wealth and kingship, the ‘straighten’ meaning of *h3reg- results in other interesting links. We have the (originally separate) verb and noun rake, a device for making straight lines, and the former participle right, originally meaning ‘straightened, directed’. Then we have reckon, perhaps a natural extension of the metaphor of lining things up in order to count them. Finally, from a causative ‘make straighten up’ we have reach (as if ‘straightening out one’s arm’).

This here is the greatest joy of etymology for me; by untangling these webs of relationships, we can show how so much of our vocabulary results from variations upon a common root. It reminds us of the continual creativity involved in using language and, by extension, the creativity of language users, i.e., humans.

Linguistic fieldwork in the Russian Federation

Linguistic fieldwork in the Russian Federation

Surrey Morphology Group, despite being a relatively small research group, nevertheless conducts linguistic fieldwork on all (inhabited) continents. Countries where members have worked over the years include Australia, Bulgaria, Canada, Colombia, Kenya, Mexico, Namibia, Nepal, Nigeria, Russia, Serbia, and Vanuatu. Fieldwork in every region has its peculiarities, not necessarily connected to the linguistic properties of the language(s) studied, and it is the peculiarities of one such region which I would like to discuss today.

My personal fieldwork experience has involved several different regions of Russia, in the republics of Daghestan, Mari-El, Komi and Khakassia. Each of these regions has been fascinating in its own way, but Daghestan takes the lion’s share of the fieldwork I do. It is a mountainous region in the south of Russia stretching from the Caspian Sea to the Caucasus. It has borders with Azerbaijan and Georgia to the south, and within the Russian Federation it is next door to Chechnya. Medieval geographers described the Caucasus as “a mountain of tongues”, and with good reason. There are over forty languages spoken in this relatively small territory (just 50,300 sq km), and most of the linguistic diversity lies within an even smaller mountainous region in the south of Daghestan, involving languages of the indigenous Nakh-Daghestanian family.

I wrote before about the linguistic interest of the language I have worked with the most, Archi (in many respects a typical representative of the family), but today I want to talk about social and cultural aspects of the work.

Culturally, Daghestan is a relatively homogeneous region; traditionally people lived in small villages, bred sheep and grew sturdy grains like rye and barley. Before the 20th century, many villages were organised as follows: there was one central village where people got together during summer months while the sheep were in the alpine pastures and did not need shelter during the night, and in winter months the people would go to smaller hamlets where the sheep (split into smaller groups) were kept in the houses or in underground sheepfolds made in the caves. The name for these winter sheepfolds is the same across several Daghestanian languages, so we can safely assume this was common practice for a long time.

Daghestanian shepherding

After the Revolution of 1917 and the creation of the Soviet Union, many people got the opportunity to drive the sheep to regions with a milder climate near the Caspian Sea, and these shepherding practices ceased to exist. The smaller hamlets either disappeared or grew into proper villages, and in the latter case developed some dialectal differences. The people like to notice those differences but at the same time they still often perceive the conglomerate of the central village and the “hamlets” (which in some cases are even larger than the central village) as a single village.

Besides sheep breeding, Daghestanian people grew grain, and traditionally they would roast grains and make flour out of them. That flour can be mixed with water and then eaten directly, and in some villages they still make this “shepherd’s food” (they call it “old time instant noodles”). There were also many traditional crafts, among which are the silver products of Kubachi, wood inlaid with silver from Untsukul, Lezgian knitted slippers and earthenware from Balkhar.

From a sociolinguistic point of view, the Daghestanian languages were in a much better state during the 20C than many other smaller languages of Russia. Although only a handful of Daghestanian languages were recognised by the state and therefore taught at school, children in smaller language communities remained monolingual until well into their teens. Most Daghestanian people belonging to smaller language groups also speak the language of a larger Daghestanian neighbour (such as Avar, Dargwa or Lezgian) and one national language, whether Russian, Azeri or Georgian, although in the last 50 years Russian has been steadily coming to replace the others. The first thing that strikes a linguist who comes to Daghestan (especially if that linguist has experience of working with small languages in other parts of the Russian Federation) is how proud the people are of their languages, how ready they are to share them, how much delight they take in their complexity. Indeed, since they all speak at least one other language, they can well see that their languages are more complex, at least phonetically (for example, Archi has 70 consonants).

Some places in Daghestan have kept their traditional ways better than others: thus, in 2004, when I first came to Archi, I was really fascinated to see many women wearing traditional clothes and jewellery not only on special occasions but every day.

Living in people’s houses, I could see that they used traditional cots for babies and had retained most of the old practices connected with childbirth. For example, right before having her first baby, the woman goes to her mother’s house and stays there for the first 40 days of the baby’s life, being completely looked after (very often she just stays in bed). After 40 days, she moves back to her and her husband’s house in a very colourful procession: the whole thing is called “moving of the cot”.

Moving the cot

But maybe the most important cultural characteristic of Daghestan is the living cultural practice of protecting one’s guests. Stemming from old times when travelling in the Caucasus mountains was not always safe, if one happens to come to a Daghestanian village one will be invited into a house, given food and shelter and will become kunaks with the master of the house. Kunak is not easy to translate. It means ‘guest’, but also ‘friend’. So I can say “I have a kunak in that village” meaning there is a person there who once was my guest (or vice versa) and now we are friends, so I can always count on having food and shelter in his house as much as he can in mine. In former times it was a duty for the master of the house to protect his kunak such that if anything were to happen to him, the perpetrator of the bad deed would answer to the house where the guest was staying. This system is still very much alive in Daghestan, and once I had eaten or slept in somebody’s house, I knew that I would be safe in that village and probably the neighbouring ones too.

 

Whisky Galore and A Go Go!

Whisky Galore and A Go Go!

When it comes to etymology, most words have a somewhat mundane route into a language: they either are retained from a direct ancestor or were borrowed at some point from another language. Within the latter category, these words tend to come in batches, often either through an intensive period of contact between peoples, as with the Old Norse loans into English, or through the importation of specific vocabulary which related to aspects of culture which were being borrowed from the group in question, such as e.g law terms deriving from the French used in English courts after the Norman Conquest.

However, every so often, there come along lexical items with a significantly more complex and idiosyncratic path into a language, and occasionally words may interplay with one another in interesting ways. We find such a complex interplay with galore and agogo.

Galore by itself is already an interesting form, as it is one of a small number of loanwords from Gaelic (likely specifically Scottish gu leòr) which does not have some kind of connection with Gaelic culture or geography. This expression can mean either ‘enough’ or ‘much, plenty’, and occurs in several constructions as a result. For instance, in Scottish Gaelic when asked ‘how are you?’, one might respond ceart gu leòr ‘all right, OK’, literally ‘right enough’.

This phrase, in a number of varying spellings such as gilore or gallore, appears to have begun to arrive in English in the mid 17th Century (or at least this is the date of the earliest citation in the Oxford English Dictionary). When this form was borrowed into English it underwent semantic shift and narrowing, coming to specifically mean ‘in abundance, plenty’, losing the sense of ‘enough’. It seems to have been somewhat colloquial in use, not being particularly frequent in writing, and is disproportionately concentrated in Scottish works, including an attestation in the journals of Walter Scott.

This form comes to its greatest in prominence in English through its use in a Compton Mackenzie novel and later Ealing comedy titled Whisky Galore! Both the novel and film centre on a remote Scottish island, and the novel in particular makes use of Gaelic throughout, so the use of ‘galore’ fits in well with the setting.

This work in particular, however, had a more interesting impact than simple popularity. As with many best-selling works, it received translations into other languages, and in this case the French translation was titled Whisky à-Gogo, deriving likely from the Old French gogue ‘fun’. This title then was itself used as the name of a nightclub in Paris, the world’s first discothèque. The concept rapidly grew in popularity, with Whisky à-Gogo venues spreading across the globe, as far as Papeete in Tahiti (and Cardiff!), the most famous probably being the the Whisky a Go Go on Sunset Strip in Hollywood. (In the English-speaking world gogo got split into two, possibly on false analogy with the verb ‘go’.)

A film poster for the film 'Roadrunner a go-go'
But there’s only one Roadrunner…

From here on ‘a go go’ or just ‘go go’ became a by-word for everything hip and cool (or ‘groovy’) in the 1960s. Go-go dancers dance in go-go clubs, of course, but the meaning became more and more nebulous over time. In cinema, 1965 was a banner year, with Roadrunner a go-go up against Monster a go-go. This year also an unsuccessful attempt to extend this—word? phrase?—by analogy, with the notorious Batman parody Rat Pfink a Boo Boo. Nobody seems to have got this (not terribly good) joke, and on subsequent reissues the film was “corrected” to Rat Pfink and Boo Boo. (You’re reading this etymology here first. Even the director who came up with the title didn’t realize it, but we’re linguists, we know better.) But the shelf life of terms denoting popular trends is short, and anyone using it now probably means for it to lend antiquated flavour of the swinging 60s. Contrast with galore, which retains its more generic use and seems unlikely to drop out of common usage in the near future.

Who are the Germans?

Who are the Germans?

You may be familiar with the fact that the Germans refer to themselves as Deutsch and their country as Deutschland, and we find this term also in most other Germanic languages, such as Dutch Duits or Swedish Tysk, as well as Italian Tedesco. However, there are many other names in other parts of Europe. The French and Spaniards call them Allemand/Alemán, as do the Welsh with Almaenaidd; the various Slavic languages share a different term again, seen in e.g. Polish Niemiec or Russian Nemets. In the Baltic the Lithuanians and Latvians have their own terms not seen anywhere else (Vokietis and Vācijis respectively), while in Finland and Estonia they call them Saksi. We could also add some assorted forms from smaller languages, such as Miksas from Old Prussian, an extinct sister language to Lithuanian and Latvian.

An aerial shot of the meeting of the Rhine and Mosel rivers at Koblenz
The Deutsches Eck, or ‘German corner’, in Koblenz

Now, it is not unusual for inhabitants of a country to refer to themselves and their country with a different form from that used by outsiders (when was the last time you called China Zhongguo or India Bharat?). What is particularly notable about the German case, however, is the diversity even among its immediate neighbours. Contrast e.g. France, where everyone uses some form of derivative of Latin Francia (after the Germanic tribe the Franks), though the Greeks still call it Gallia after the Roman province of Gaul. Similarly, most call Spain some form derived from Hispania and Italy one from Italia. So, this diversity in names for the Germans requires some explanation.

Whence this plethora of terms? A consideration of history leads us to our answer. Recall that the modern country of Germany is a relatively recent creation, only being officially united in the mid 19th century by Otto von Bismarck. While there was a political entity that occupied the area in the form of the Holy Roman Empire it was only a relatively loose collection of small states, and prior to that the area was inhabited by a number of distinct Germanic-speaking peoples.

As a result, some of these names derive from the individual groups or tribes which lived in part of the area: so in the Western Romance and Brittonic Celtic languages the name of the Alemanni tribe was applied to the Germans as a whole. The same process occurred in the northeast with the Baltic Finns and the Saxons: not only were the Saxons the nearest group, but also, due to a combination of the Hanseatic League controlling trade through the Baltic and the anti-pagan crusading of the Teutonic Knights (another Deutsch-relative, see below), many Saxons came to settle in the Eastern Baltic, with some of their descendants still living in Estonia and Latvia today. Some small varieties show different groups again: some of the smaller Germanic varieties use a form derived from Prussian, after the state which ended up uniting the German peoples.

English takes a slightly different approach, deriving the term Germans from the Latin name of the region; Germania. This term included two Roman provinces covering much of modern-day Belgium, Switzerland, parts of eastern France and the Rhineland in modern Germany, as well as applying to the larger swathe of barbarian territories further east. Interestingly, several languages use this term to refer to Germany the country despite using a different term to refer to the Germans: Italian and Russian are the most notable examples.

We find a different source again with the Slavic Nemets terms. There is again some dispute in origin, but the general consensus is that it derives from a Slavic root *němъ meaning ‘mute’, itself of contested origin. The meaning likely was not ‘mute’ necessarily, but rather simply denoted that these groups were not Slavic-speaking. This puts in a similar group to the word ‘barbarian’ in fact, which derives from a Greek word meaning ‘those who go bar-bar/talk incomprehensibly’. Similar origins to do with ‘talking’ are likely behind the Baltic Vok-/Vāc-/Miks- forms as well.

Finally, what of German ‘Deutsch’? Well, as is the case with many endonyms it is a relatively simple and self-referential etymology. It ultimately derives from an Indo-European root *tewteh2 meaning simply ‘people’, which shows up also in e.g. Irish túath with the same meaning. This form may also be the source of Romance forms such as Spanish todo or French tout meaning ‘everyone/everything’. This root even survives in Slavic, in Russian giving the form čužoj, meaning ‘foreign, alien’. This ended up as Germanic *þeudō, which through an adjective formation *þiudiskaz meaning something like ‘of the people’ ultimately leads to the modern German form. This form also gives Latin Teutones, a likely Celtic or Germanic tribe which lived in the North German region and was encountered by the Romans early in their expansion northwards.

So, as with many other terms, such as the aubergine words which have been discussed here before, the differences between languages are reflective of a complex history. In this case the wide array of disparate terms of different etymologies reflects the complex history of the entity involved, specifically the absence of a country that even called itself ‘Germany’ until the modern era, as well as the extent to which different groups of ethnic Germans have moved about in Europe.

Is twote the past of tweet?

Is twote the past of tweet?

Have you ever encountered the form twote as a past tense of the verb to tweet? It is something of a meme on Twitter, and a live example of analogy (and its mysteries). However surprising the form may sound if you have never encountered it, it has been the prescribed one for a long time:

Ten years later, the question popped up among a linguisty Twitter crowd, where a poll again elected twote as the correct form:

It is clear that this unusual form replacing tweeted is some sort of form, but why specifically twote? I saw here and there a reference to the verb to yeet, a slang verb very popular on the internet and meaning more or less “to throw”. Rather than a regular form yeeted, the past for to yeet is often taken to be yote. The choice of an irregular form is probably meant to produce a comedic effect.

This, precisely, is analogical production: creating a new form (twote) by extending a contrast seen in other words (yeet/yote). Analogy is a central topic in my research. I have been trying to answer questions such as: How do we decide what form to use ? How difficult is it to guess? How does this contribute to language change?

But first, have you answered the poll?

What is the past tense of “to tweet”?

To investigate further why we would say twote rather than tweeted, I took out my PhD software (Qumin). Based on 6064 examples of English verbs1, I asked Qumin to produce and rank possible past forms of tweet2. To do so, it read through examples to construct analogical rules (I call them patterns), then evaluated the probability of each rule among the words which sound like tweet.

Qumin found four options3: tweeted (/twiːtɪd/), by analogy with 32 similar words, such as greet/greeted; twet (/twɛt/), by analogy with words like meet/met; tweet (/twiːt/) by analogy with words like beat/beat, finally twote (/twəˑʊt/), by analogy with yeet. Figure 1 provides their ranking (in ascending order) according to Qumin, with the associated probabilities.

Twote 0.028 < tweet 0.056 < twet 0.056 < tweeted 0.86
Figure 1. Qumin’s ranking of the probability for potential past forms of to tweet

As we can see, Qumin finds twote to be the least likely solution. This is a reasonable position overall (indeed, tweeted is the regular form), so why would both the official Twitter account and many Twitter users (including several linguists) prefer twote to tweeted?

But Qumin has no idea what is cool, a factor which makes yeet/yote (already a slang word, used on the internet) a particularly appealing choice. Moreover, Qumin has no access to semantic similarity, which could also play a role. Verbs that have similar meanings can be preferred as support for the analogy. In the current case, both speak/spoke and write/wrote have similar pasts to twote, which might help make it sound acceptable. Some speakers seem to be aware of these factors, as seen in the tweet above.

What about usage?

Are most speakers aware of the variant twote and using it? Before concluding that the model is mistaken, we need to observe what speakers actually use. Indeed, only usage truly determines “what is the past of tweet”. For this, I turn to (automatically) sifting through Twitter data.

Speakers must choose between tweeted or twote: what a dilemna !

A few problems: first, the form “tweet” is also a noun, and identical to the present tense of the verb. Second, “twet” is attested (sometimes as “twett”), but mostly as a synonym for the noun “tweet” (often in a playful “lolcat” style), or as a present verbal form, with a few exceptions, usually of a meta nature (see tweets below). I couldn’t find a way to automatically distinguish these from past forms while also managing within the Twitter API limits. Thus, I left out both from the search entirely. This leaves only our two main contestants.

 

I extracted as many recent tweets containing tweeted or twote as Twitter would let me — around 300 000 tweets twotten between the 26th of August and the 3rd of September. 186777 tweets remained after refining the search4. Of these, less than 5000 contain twote:

There were more than 180000 occurences of tweeted and less than 5000 of twote in the past few days.
Counts of tweets containing either of two possible pasts for the verb “to tweet” in the past few days on twitter (mentions excluded).

As you can see, the tweeted bar completely dwarfs the other one. However amusing and fitting twote may be, and despite @Twitter’s prescription (but conforming with Qumin’s prediction), the regular past form is by far the most used, even on the platform itself, which lends itself to playful and impactful statements. This easily closes this particular English Past Tense Debate. If only it were always this simple!

  1. The English verb data I used includes only the present and past tenses, and is derived from the CELEX 2 dataset, as used in my PhD dissertation and manually supplemented by the forms for “yeet”. The CELEX2 dataset is commercial, and I can not distribute it. []
  2. The code I used for this blog post is available here, but not the dataset itself. Note that for scientific reasons I won’t discuss here, this software works on sounds, not orthography. []
  3. One last possibility has been ignored by this polite software, a form which follows the pattern of sit/sat. I see it used from time to time for its comic effect, but it does not seem at all frequent enough to be a real contestant (and I do not recommend searching this keyword on Twitter). []
  4. Since there has been a lot of discussion on the correct form, I exclude all clear cases of mentions. I count as mentions any occurrences wrapped in quotations, co-occurring with alternate forms, mentioning past tense, or with a hashtag. Moreover, with the forms in –ed, it is likely that the past participle would be identical, but for twote, the past participle could well be twotten. To reduce the bias due to the presence of more past participles in the usage of tweeted, I also exclude all contexts where the word is preceded by the auxiliary forms has, have, had, is, are, was, were, possibly separated by an adverb. []
History on the Ground

History on the Ground

Linguists spend most of the year stuck to the computer monitor: analyzing data, reading, or writing papers. But the time comes when we have to roll up our sleeves and find our sense of adventure. Personally, this is my favorite time of the year! Going on a linguistic field trip often involves living in a local community and immersing yourself in a completely different culture. You learn so much about the customs, traditions, beliefs… And, of course, you learn a lot about the language.

A sunny day, with a number of houses amongst trees on a hillside.
South-Eastern Serbia, one of our fieldwork destinations

Linguistic field trips are essential for researchers who work with poorly documented languages. We prepare questionnaires, design experiments, and go to the local community to collect data that we need for our research. Here, at SMG, we do fieldwork a lot. You can get a glimpse of this fascinating part of a linguist’s life in some of the previous posts. Check out, for example, this beautiful piece on Archi, this account of cultural and language diversity in the South Pacific, and the most recent post about intricate ways to express respect in  Vanuatu.

However, there seems to be a limitation. What should you do if you study the history of some phenomenon? If you are not only interested in how the system is now, but also in how it was before? We cannot jump into a time machine and reemerge in the 15th-century world to run our questionnaires there. So surely historical texts and comparative grammars are the only way to go, and fieldwork is not useful here… Or is it? Well, it turns out that a field trip can be very helpful for extrapolating historical data, but only if you are lucky with the location. Fortunately, I am!

A low-lying pile of stones on a roadside with a line of hills in the background
My lucky place

In the project “Declining case: Inflectional loss in progress”, my colleagues and I study dialects of Serbian and Bulgarian. These two languages have been posing linguists a headache for more than a century already. Although they are quite closely related and are spoken side by side, they have a lot of significant differences in the grammatical structure. To name some, (1) Bulgarian has articles (like English a and the), while Serbian does not, (2) Serbian uses infinitives, while Bulgarian does not, and (3) Serbian nouns have a fully-fledged morphological case system, while Bulgarian nouns do not inflect for case at all. People still argue about the exact reasons for this, but there is a general consensus that Bulgarian has undergone certain changes because it is located in the so-called Balkan linguistic area. A cool thing is that there is no sharp border between the innovative grammatical system of Bulgarian and the conservative system of Serbian. Rather, in the geographical zone on both sides of the Serbian-Bulgarian border, we see a variety of intermediate systems.

A map of eastern Serbia and Western Bulgaria, with an area of hatching straddling the border between the two countries

Let us see how it works on the example of the case inflection, which we study in our project. Cases are used in some languages to mark grammatical relations, such as subject or object. Serbian does it in this way, while Bulgarian uses prepositions instead. Take a look at this table, where the word ‘Cyprus’ appears in different contexts. See how in Serbian this word changes the ending depending on the context and in Bulgarian it keeps the same form? Just like in English!

Serbian Bulgarian Translation
vole Kipar xaresvat Kipâr ‘They like Cyprus’
stanovništvo Kipra naselenieto na Kipâr ‘The population of Cyprus’
pomažu Kipru pomagat na Kipâr ‘They help Cyprus’
upravljaju Kiprom upravljat Kipâr ‘They govern in Cyprus’

Overall, Serbian has six cases, while Bulgarian uses one general case form. So, what do we see in the transitional zone? Well, depending on where exactly we look, we find different systems. For example, in a more western part of the transitional area, we can meet a system where they use three cases, while in a more eastern part we can find a two-case system.

Serbian Transitional system 1 Transitional system 2 Bulgarian
Case 1 Case 1 Case 1 No case
Case 2 Case 2 Case 2
Case 3 Case 3
Case 4
Case 5
Case 6

What does it mean? It looks like standard Bulgarian at some point in its development lost its case marking on nouns completely, while the dialects in the transitional zone underwent this change to a smaller degree. The further west we move, the less this change affected the dialect. This situation created an unprecedented opportunity for us. We can go to different places in the transitional zone, compare their systems to each other, and use this comparison to create a historical model of the loss of case. We do not need a time machine, we have the different stages of this process living side by side today!

This summer, for example, I went to the municipality of Brus, which is located in Southern Serbia. There I witnessed the initial stages of case decline. People in Brus still use all six cases, but sometimes replace one with another, or insert a preposition in phrases where standard Serbian would not have it. While interviewing people, I learned about some fascinating traditions in this area. For example, one of the oldest customs at the wedding is to put an apple at the highest point in the backyard, and the groom has to hit the apple with a gun.  If he fails to do so, he is not going to get his bride!

An apple hanging by a thread from the bough of a tree.

Apparently, in earlier times, a wedding would last for several days and involve all sorts of rituals. Unfortunately, most of them are lost now. It would be so nice to see how a wedding was celebrated then! But for this, I am afraid, we do need a time machine.

Careful who you climb a tree near: Respect and taboo in Vanuatu

Careful who you climb a tree near: Respect and taboo in Vanuatu

One humid afternoon, during breadfruit season in North Ambrym, my language teacher, Isaiah, and I were on the lookout for some ripe breadfruit to roast for lunch. Our path led past his nephew, George’s, house. Isaiah saw some ripe breadfruit in the tree next to where George was sitting on his veranda. Isaiah wanted to get the breadfruit, but said that because George was there, he couldn’t, and we would have to find some others instead. I asked if it was George’s breadfruit tree, and that’s why he didn’t want to take it when George was around. Isaiah said no; rather, the problem was if we went up the tree when George was underneath, then he would have to pay a small fine to George. Over a lunch of roasted and pounded breadfruit called wuwu, Isaiah explained further. It was to do with respect and taboo.

Respect in language takes many forms. There is the tu/vous distinction in French, where tu is the informal form of ‘you (singular)’ and is used with friends and those younger than you, whereas vous ‘you (plural)’ is formal and is used with those elder or senior than you and for people you don’t know. Similar distinctions are found with the German du/Sie. English doesn’t have a grammatical distinction in politeness like this, but uses different sentence structures to express politeness: compare pass me the salt please with could you please pass me the salt, or the even more polite would you be so kind as to pass me the salt please.

Now let’s get back to eating that heavy sticky coconut-cream-slathered wuwu with Isaiah. He told me that you must respect certain members of your extended family by showing physical politeness. Respect is translated as tengnean in the language of North Ambrym. The people who you must respect are your taboo family, described by the verb gorrne. Respect for your taboo family on Ambrym is realised in different ways – through physical restrictions and through language. The family members who command the most respect are your sister’s son or your husband’s brother.

The physical restrictions with a taboo relative include:

  • You can’t eat in front of them
  • You can’t joke with them
  • You can’t climb over them, or be physically higher than them
  • You can’t sleep in front of them
  • You can’t enter their house

But what about restrictions on language? The normal translation of ‘hello’ in North Ambrym would be neng le, which literally means ‘you there’, using neng, the singular form of ‘you’. But you are not allowed to say this to your taboo relatives. Instead, you must say gōmōro le using the dual form of ‘you’, meaning ‘you two there’, even though you are addressing one person. This is similar to French or German mentioned earlier. However, North Ambrym, like many Oceanic languages, not only has singular and dual, but also paucal, meaning ‘a few’, and plural pronouns. Of these possibilities, the dual is used for respect, not the plural as in French or German.

Respect is not confined to pronouns such as ‘you’; people also have to avoid using certain words in front of their taboo relatives. For example, if your sister’s son came, and you invited him to sit down and have some food, you would have to avoid certain verbs, such as taa ‘sit’ or ngene ‘eat’. You would use lingi ‘put’ instead of ‘sit’ and tewene ‘make’ instead of ‘eat’ so the whole sentence would be rephrased as ‘you-two come and put your-dual-self here and make the food’.

You must also avoid certain words concerning body parts, specifically words relating to parts of the head. Normally when talking about body parts in North Ambrym you would use a bound noun – a type of noun which specifies who owns the body part – so the word for ‘tooth’ would be lowo-n ‘his/her tooth’, lowo-m ‘your tooth’, or lowo-ng ‘my tooth’. The end of the noun (-n/-m/-ng in this example) indicates whose tooth it is. But these words are not allowed when talking in front of your taboo relatives. Instead, you could use a free form of the noun, such as leo ‘tooth’.

Another avoidance strategy is to change a verb to a noun using a special nominalising prefix a- that appears on the beginning of the word and turns it into a noun. The verb itself is also reduplicated. For example, the verb ta ‘cut’ can be turned into a noun atata ‘tooth’ (literally ‘thing for cutting’).

Finally, a more idiomatic expression could be used; in this case, tooth is replaced by which literally translates as ‘limpet shell (traditionally used as a vegetable grater)’ or teye ‘clam shell/axe’ as a way of avoiding the bound form for ‘tooth’.

Here’s a handy table to help you get your head (or just head!) around avoiding the bound forms.

Bound Free Nominalisation Idiomatic
rralnye-n ‘his, her ear’ teleng ‘ear’ arorongta ‘thing for listening, headphones’ harrlengleng ‘listening’
lowon ‘his, her tooth’ leo ‘tooth’ atata ‘thing for cutting’ ‘limpet shell (used as a grater)’

teye ‘clam shell, axe’

metan ‘his, her eye’  marr ‘eye’ ateter ‘thing for seeing, glasses’ hal ‘road, path’

glas ‘glasses’

guhun ‘his, her nose’  kuu ‘nose’ akunuknuu ‘thing for smelling’
woulun ‘his, her hair’ wovyul ‘hair’ ōrr ge mre ‘place which is above’

As time passes, so do traditions, and the older generations mourn the loss of respecting their taboo relatives. They complain that younger generations now joke with their taboo relatives or put their arms around them. This art of speaking is being lost and the physical taboos are being eroded. However, this change is not new and has been going on for several generations. Some of the more extreme forms of respect are almost out of living memory. One of the village elders, Ephraim, recounted a memory of seeing how his grandmother, Mataran, displayed respect when returning from the garden, with her vegetables one day. When she approached her home, she saw that one of her husband’s brothers was there. She came close, then crawled the rest of the way past her husband’s brother with her basket of vegetables over her shoulder, until she was in her doorway before standing up again.

So the next time you are in Vanuatu, take care when climbing trees and make sure you know which of your relatives are nearby!

The Story of Aubergine

The Story of Aubergine

As the University of Surrey’s foremost (and indeed only) blog about languages and how they change, MORPH is enjoyed by literally dozens of avid readers from all over the world. But so far these multitudes have not received an answer to the one big linguistic question besetting modern society. Namely, what on earth is going on with the name of the plant that British English calls the aubergine, but that in other times and places has been called eggplant, melongene, brown-jolly, mad-apple, and so much more? Where do all these weird names come from?

I think the time has finally come to put everyone’s mind at rest. Aubergines may not seem particularly eggy, melonish, jolly or mad, but lots of the apparently diverse and whimsical terms for them used in English and other languages are actually connected – and in trying to understand how, we can get some insight about how vocabulary spreads and develops over time. It turns out that one powerful impulse behind language change is the fact that speakers like to ‘make sense’ of things that do not inherently make sense. What do I mean by that? Stay tuned to find out.

Long purple aubergine

To get one not-so-linguistic point out of the way first, there is no real mystery about eggplant (the word generally used in the US and some other English-speaking countries, dating back to the 18th century), which is not linked to anything else I am talking about here. It is hard to imagine mistaking the large, purple fruit in the photo above for any kind of egg, but that is not the only kind of aubergine in existence. There are cultivars with a much more oval shape, and even ones with white rather than purple skin: pictures like this, showing an imposter alongside some real eggs, make it obvious how the word eggplant was able to catch on.

Small white eggshaped aubergine in an eggbox between two real eggs

Meanwhile, aubergine, which is borrowed from French as you might expect, has a much more complex history, and can be traced back over many centuries, hopping from language to language with minor adjustments along the way. The plant is not native to the US, Britain or France, but to southern or eastern Asia, and investigating the history of the word will eventually take us back in the right geographical direction. Aubergine got into French from the Catalan albergínia, whose first syllable gives us a clue as to where we should look next: as in many al- words in the Iberian peninsula (e.g. Spanish algodón ‘cotton’), it reflects the Arabic definite article. So, along with medieval Spanish alberengena, the Catalan item is from Arabic al-bādhinjān ‘the aubergine’, where only the bādhinjān bit will be relevant from here on. This connection makes sense, because the Arab conquest had such an impact on the history of Iberia. And more generally, we have the Arabs to thank for the spread of aubergine cultivation into the West, and also – indirectly – for this charming illustration in a 14th-century Latin translation of an Arabic health manual:

Illustration featuring three people in front of a stand of aubergine plants
Page from the 14th c. Tacuinum Sanitatis (Vienna), SN2644

But bādhinjān is not Arabic in origin either: it was borrowed into Arabic from its neighbour, Persian. In turn, Persian bādenjān is a borrowing from Sanskrit vātiṅgaṇa… and Sanskrit itself got this from some other language of India, probably belonging to the unrelated Dravidian family. The word for aubergine in Tamil, vaṟutuṇai, is an example of how the word developed inside Dravidian itself.

That is as far back as we are able to trace the word. But the journey has already been quite convoluted. To recap, a Dravidian item was borrowed into Sanskrit, from there into Persian, from there into Arabic, from there into Catalan, from there into French, and from there into English – and in the course of that process, it managed to go from something along the lines of vaṟutuṇai to the very different aubergine, although the individual changes were not drastic at any stage. The whole thing illustrates how developments in language can go with cultural change, in that words sometimes spread together with the things they refer to. In the same way, tea reached Europe via two routes originating in different Chinese dialect zones, and that is what gave rise to the split between ‘tea’-type and ‘chai’-type words in European languages:

[Map created by Wikimedia user Poulpy, licensed CC BY-SA 3.0, cropped for use here]
This still leaves a lot of aubergine words unaccounted for. But now that we have played the tape backwards all the way from aubergine back to something-like-vaṟutuṇai, we can run it forwards again, and see what different historical paths we could follow instead. For example, Arabic had an influence all over the Mediterranean, and so it is no surprise to see that about a thousand years ago, versions of bādhinjān start appearing in Greece as well as Iberia. Greek words could not begin with b- at the time, so what we see instead are things like matizanion and melintzana, and melitzana is the Greek for aubergine to this day. There is no good pronunciation-based reason for the Greek word to have ended up beginning with mel-, but what must have happened is that faced with this foreign string of sounds, speakers thought it would be sensible for it to sound more like melanos ‘dark, black’, to match its appearance. That is, they injected a bit of meaning into what was originally just an arbitrary label.

Meanwhile the word turns up in medieval Latin as melongena (giving the antiquated English melongene) and in Italian as melanzana, and a similar thing happened: here mel- has nothing to do with the dark colour of the fruit, but it did remind speakers of the word for ‘apple’, mela. We know this because melanzana was subsequently reinterpreted as the expression mela insana, ‘insane apple’. To produce this interpretation, it must have helped that the aubergine (like the equally suspicious tomato) belongs to the ‘deadly’ nightshade family, whose traditional European representatives are famously toxic. So, again, something that was originally just a word, with no deeper meaning inside, was reimagined so that it ‘made sense’. As a direct translation, English started calling the aubergine a mad-apple in the 1500s.

Parody of the "Keep Calm and Carry On" posters, reading "You don't have to be mad to work here but it helps"
Poster from a 16th c. aubergine factory

There are many more developments we could trace. For example, I have not talked at all about the branch of this aubergine ‘tree’ that entered the Ottoman Empire and from there spread widely across Europe and Asia. But instead I will return now to the Arab conquest of Iberia. This brought bādhinjān into Portuguese in the form beringela, and then when the Portuguese started making conquests of their own, versions of beringela appeared around the world. Notably, briñjal was borrowed into Gujarati and brinjal into Indian English, meaning that something-like-vaṟutuṇai ultimately came full circle, returning in this heavy disguise to its ancestral home of India. And to end on a particularly happy note, when the same form brinjal reached the Caribbean, English speakers there saw their own opportunity to ‘make sense’ of it – this time by adapting it into brown-jolly.

Brown-jolly is pretty close to the mark in terms of colour, and it is much better marketing than mela insana. But from the linguist’s point of view, they both reinforce a point which has often been made: speakers are always alive to the possibility that the expressions they use are not just arbitrary, but can be analysed, even if that means coming up with new meanings which were not originally there. To illustrate the power of ‘folk etymology’ of this kind, linguists traditionally turn to the word asparagus, reinterpreted in some varieties of English as sparrow-grass. But perhaps it is time for us to give the brown-jolly its moment in the sun.

Yesterday, Today and Tomorrow

Yesterday, Today and Tomorrow

How do we talk about time? This may seem a simple question with a simple answer; we are all human, surely we all experience time the same way? That may be true, but that doesn’t mean that all languages organise the time in the same way. This is arguably most apparent when it comes to talking about the days either side of the present day. We all live on earth and so therefore all experience a day-night cycle; all can understand how one day follows after another. However, the words we use to locate events in this cycle can vary wildly in their construction.

Let’s take a look at two languages, Scottish Gaelic and Sylheti, and see how their systems compare with that of English. All three of these languages belong to the same family, Indo-European, so it might be assumed that they show many similarities. And yet each still exhibits significant variation in how they talk about time.

Firstly, Scottish Gaelic. Like English, it distinguishes between ‘yesterday’, ‘today’ and ‘tomorrow’. The terms each show a consistent structure with a frozen prefix a(n)- with three morphologically opaque roots; an-dè, an-diugh and a-màireach respectively. Furthermore none of the Gaelic terms has any connection with the normal word for ‘day’, latha/là. Compare English, where yester-day and to-day both feature the word ‘day’, while to-day and to-morrow both feature a frozen prefix to- (historically a demonstrative). Additionally, there are also single terms for ‘last night’ as well as ‘tonight’ with a-raoir and a-nochd respectively, again with no immediately apparent connection with the normal term for ‘night’ oidhche. On the other hand, there is no single term for ‘tomorrow night’ so the compound expression oidhche a-màireach is used instead. There are also additional terms for ‘the day after tomorrow’ and ‘the day before yesterday’, an-earar and a bhòn-dè respectively, while the latter has a counterpart in a bhòn-raoir for ‘the night before last’. English is also reported to have had similar terms in the form of ereyesterday and overmorrow, though these have fallen out of usage in the modern day.

Gaelic is also in another respect slightly more regular than English in how it refers to parts of the day. While in English we have a split between ‘this morning’ and ‘yesterday morning’, Gaelic instead uses madainn an-diugh and madainn an-dè, where the former literally translates to ‘today morning’.

But all this is not really that surprising. All that really distinguishes Scottish Gaelic from English in this respect is which time categories are given single indivisible terms rather than compositional expressions; the fundamental organisation of the system is still broadly similar to English. To see a far more radically different system of organising time words, we will now turn to Sylheti, an Indo-Aryan language spoken in north-eastern Bangladesh by around 9-10 million and by perhaps a further 1 million in diaspora, including by most of the British Bangladeshi community.

Here, instead of distinguishing between ‘yesterday’ and ‘tomorrow’, we instead find a single term xail(ku), contrasting with aiz(ku) meaning ‘today’ (the -ku is a suffix which can optionally appear on a lot of ‘time’ words, such as onku ‘now’ or bianku ‘(this) morning’). The two senses of ‘tomorrow’ and ‘yesterday’ can be distinguished by combining them with goto ‘past’ and agami ‘future’, but just as commonly instead the distinction is solely marked by whether the verb is in the past or future tense, e.g. xailku ami amar bondu dexsi ‘I saw my friend yesterday’ vs. xailku ami amar bondu dexmu ‘I will see my friend tomorrow’.

This is not an isolated instance in the language, either, but in fact represents a consistent trend. So in the same manner foru can be either ‘the day before yesterday’ or ‘the day after tomorrow’ depending on context and toʃu the same but at one day further removed.

Table of day and night terms in english, Gaelic and Sylheti respectively
Visualising the systems

Nor is Sylheti unique in using this kind of system; it is also found in many parts of New Guinea, for example. Yimas, a language of northern New Guinea, also uses the same term ŋarŋ for both ‘yesterday’ and ‘tomorrow’, urakrŋ for ‘two days removed’ and so on, all the way up to manmaɲcŋ for ‘five days removed’. Once again whether the reference is to the past or future is carried by the choice of tense on the verb, though Yimas has a far more complex system than that seen in Sylheti, for instance distinguishing a near past -na(n) from a more remote past -ntuk~ntut.

Sylheti also has more fine grained distinctions for parts of the day than either English or Scottish Gaelic. For example, if one wishes to say ‘in the morning’ one must decide whether one is talking about the early morning (ʃoxal) or the mid to late morning (bian). Additionally, while forms such as ‘yesterday/tomorrow afternoon’, ‘the night before last/after next’ and ‘yesteray/tomorrow morning’ use compound expressions (xail madan, foru rait and xail bian/ʃoxal respextively), to express ‘this morning/this afternoon/tonight’ the word for the part of the day (perhaps with the oblique suffix -e or a time suffix -ku) is sufficient by itself, for example amra ʃoxale Sylheʈ aisi ‘We arrived in Sylhet this morning’ or ami raitku dua xotram ‘I am praying tonight’ (with rait ‘night’).

This is just one small part of the temporal vocabulary, and only looking at representatives from a single family, and yet already we see great variation in how time is organised and discussed. It is not so much that these groups have fundamentally different conceptions of time, as these languages share a common ancestor and are only separated by a few thousand years. Instead, it is a testament to the fluidity of time itself, resulting in the words used to refer to it easily shifting in meaning and being reorganised over generations.

Word games

Word games

You have very certainly heard about Wordle, the viral word game by powerlanguage, recently bought by the NYT. In the original game, a 5-letter English word is secretly chosen every day, which players attempt to guess in 6 tries. Each guess is answered by colored cues: green for “correct letter in the correct place”, orange for “correct letter in the wrong place”, gray for “incorrect letter”. The concept of wordle is not new, and resembles games such as Jotto, Lingo, and mastermind.

 A sample game of Mastermind.
A sample game of Mastermind.

While some may have been annoyed by the endless stream of three-color square emojis reporting players’ success and inundating social media I have been delighted by the productivity displayed by the many variants: in hello wordl, play an endless number of games; in dordle, quordle, octodle guess several words at once; in squardle, play in two dimensions; in nerdle, guess a mathematical formula; in absurdle, the games does its best to get away from your guesses, etc.

Quordle lets you play 4 games at once
Quordle lets you play 4 games at once

Some derived games transform the game mechanics, but the simplest variation is to switch the vocabulary (have you tried queerdle or lordle of the rings?) or the language. Indeed, wikipedia already references more than 40 wordle language variants. If I believe my social feeds, many linguists have found that they were able to play in languages that they didn’t speak, provided that they had some intuitions of the phonotactics and orthographic sequences. I was however quite disappointed to see that many versions retained the English-centric 1-letter:1-unicode-character, and avoided diacritics altogether, leading to strange impoverished typography — this is the case for example of the French wordle, “le mot”.

 

The French wordle accepts "meler", but not "melez"
The French wordle accepts “meler”, but not “melez”

 

While playing variants, I realized that a wordle is only as good as its word list: some games rely on lexicons which contain only citation forms (infinitives for French verbs) and exclude the many others inflected forms, leading to a frustrating game experience. For example, in Le Mot, one can play mêler (or more exactly, meler) “to mix”, but not meles “(you) mix”. It happens that well curated words lists including inflected variant is a Surrey Morphology Group specialty: lexicons and dictionaries are a common product of language documentation, and as its names indicates, researchers at the SMG have a particular focus on morphology. We have been maintaining open inflectional databases since the 90s. After discussion, we agreed collectively to start by producing two wordle-like games, corresponding to the two main lexicons in the SMG databases, respectively the Dictionary of Archi and the Nuer Lexicon.

Nuerdle interface
SMG wordle in Nuer: Nuerdle

The Nuer language, or Thok Nath, is a West Nilotic language spoken by approximately 900,000 to two million people in South Sudan and Ethiopia, as well as in diaspora communities throughout the world. The SMG has created an interactive online dictionary for it. From this lexicon, I have extracted 6218 words, mostly verbs and nouns, with a few other part of speech represented. All targets are taken from this set of words. However, using only the lexicon would risk rejecting a lot of words the speakers might know, even though they are not documented in the lexicon. Thus, I also extracted all of the words from the Nuer translation of the Bible1. This led to a total lexicon of 13476 words2.

Archidle interface
SMG wordle in Archi: Archidle

Archi is a Daghestanian language of the Lezgic group spoken by about 1200 people in Daghestan. At the SMG, we created a dictionary of Archi, with entries in Russian, English, and Nuer (both orthographic and phonetic forms), from which I extracted 3626 words for our wordle puzzle. For now, we do not have any more words for Archi, but we are working on it. In the game, we have ignored the stress diacritics, which might not be intuitive enough for speakers.

Two Nuer Keyboards. On the left, from a mobile app. On the right, our keyboard.
Nuer keyboards: from a mobile app (left), or from our wordle game (right).

In order to create the SMG wordles, I started from the open source code of the re-playable version, hello wordle. In order to keep the game closer to its original, I removed the re-playable function. However, I did keep the option to play a range of word length from 4 to 7 letters. Each day, you can thus play 4 games in each language.  A main challenge was that the Nuer orthography comprises diacritics, which required rewriting large parts of the game, as it previously assumed that each letter could be written with a single character. Another difficulty came from the fact that neither language has a unique, widely used, keyboard layout. For Nuer, we created one based on a mobile keyboard, which we extended to include more diacritics.

Two Cyrillic Keyboards. On the left, standard Russian layout. On the right, our keyboard for Archi.
Cyrillic keyboards: Russian keyboard from a mobile app (left), or Archi keyboard from our wordle game (right).

In both cases, we strove to make the game playable by learners, linguists, and curious people who do not speak Archi or Nuer. For this reason, we made the default word length 4 letters rather than 5, to make the game easier. Moreover, we added short English definitions for all words in our lexicons, with links to their full definitions in our resources. Words in Nuer from the bible are not always present in our Nuer lexicon, and hence, some words in Nuer can appear without translations. Finally, in order to help beginners get started, we provide a few example words of the correct length each day, hidden by default, which can be used to start playing.

Ri̱et: "word" in Nuer
A word played in Nuerdle, with translation in the margin

Besides learning the languages, scouring the dictionary, or using the words given as hints daily, how can you get better at the Nuer or Archi wordle ? It helps to pay attention to the frequency of each letters, and try to play words with frequent letters, in order to reduce the pool of potential words quickly. For the English wordle, some have calculated the optimal starting word. Rather than risk spoiling the game, I provide below the relative frequencies of each of the 5 most frequent letters, for each position (1 to 7) in Nuerdle and Archidle words. This should give an idea of frequent letters at each position. The colors are assigned according to overall frequency in the lexicon, with light greens more frequent than dark blues. Each bar represents the frequencies of the five most frequent letters in a word position (from 1 to 7), ignoring the other, less frequent letters. Each stacked colored bar’s height, between two white lines, represents the letter’s frequency: eg. in Nuer, a word in our lexicon starts with k around 10% of the time, and with around 12% of the time. If there is some interest, a future blog post could explore further the frequent sequences and letter patterns in either languages.

Frequency of each character in Nuer words in our lexicon, per positon
Frequency of each character in Archi words in our lexicon, per positon

Finally, since this is a morphology blog, I would like to draw your attention to the interesting way in which English acquired a new -dle suffix. The original game is called wordle, a combination of the creator’s last name Wardle, and of word. As the game became viral, the apparent suffix has come to mean “game in the wordle family” (or maybe “online guessing game”). Interestingly, even though the most obvious decomposition of wordle seems to be word+le, the productive suffix is -dle, not -le. Could this be because the family resemblance in the new words is more obvious by keeping more common material ? Isn’t analogy mysterious? In any cases, after hesitating with ri̱etle (from ri̱et “word”+le, in Nuer) and č’atle (from č’at, “word” in Archi), we settled instead on calling our games Archidle and Nuerdle.

 

  1. excluding words starting with a capital, in order to avoid proper names. []
  2. If you want to suggest missing Nuer words, the Nuer lexicon has a module for suggestions ! []