Vanuatu: an archipelago full of languages and their names

The Republic of Vanuatu, an archipelago with over 130 indigenous languages, has a myriad of ways of naming them. With so many islands and languages I won’t be able to tell you the history of all those names in such a short space but hope to highlight some of the more interesting naming techniques.

There are two main ways that languages can be named – either by the people who speak them – endonymic, or a name given by outsiders – exonymic. In the case of Vanuatu, this has led to a confusing array of multiple names for the same language.


Several of the languages of Pentecost Island are named after indigenous words meaning ‘what’ – Sa, Ske, Apma and Hano are all named this way. Did these names arise due to brief exchanges between the different language communities? Was the question, ‘What is your language called?’ met with a rather confused reply of ‘What?’. However amusing this is, it is probably not how these names came about. The terms for ‘what’ are actually linguisitc identifiers, words in the different languages that set them apart from each other and were highlighted by the different language communities – ‘we say sa here, but they say ske there’.

The Hano language was originally known to Europeans as either Lamalanga or Loltong, after two of the larger villages where the Christian Mission were located.1 Nowadays, speakers of Hano prefer to call their language Raga. This is the endonymic term used not only for the language, but also for the northern part of Pentecost, where the language is spoken, and for the island as a whole.2 Of course, to make things more complicated there are other exonymic names for Raga, such as Kihip, given to it by the speakers of Apma.


Two of the languages of Malekula Island, Naman and Sang, are both endonymic expressions of surprise.3 Naman, apart from being a palindromic language with a palindromic ISO code, also has a surprising history as it was previously known as Litzlitz, the name of a village where some of the speakers still live. Litzlitz is itself a colonial twisting of the true endonymic name of the village – Lenslens – named after the pieces of dead coral which are washed ashore from the reefs and make up many of the beaches in the archipelago.


Many languages are simply named after the location where they are spoken, such as the place names used by missionaries on Pentecost Island above. One language, North Ambrym, is named after the part of the island it is spoken on – Ambrym. The island is believed to have been named when Captain Cook explored the archipelago and came ashore near the village of Fonah in the northern part of Ambrym Island. He is said to have exchanged oranges with the local chiefs, who gave him yams in return, who said in the local language, North Ambrym, am rrem ‘your yams’.

Captain Cook receiving yams from the chiefs of Fonah – from a North Ambrym story book told by Benjamin Toforr and illustrated by Zakary Bong.

So, the name for the language spoken in the northern part of the island is a concoction of a cardinal direction and an exonymic mangling of an indigenous phrase. As the North Ambrymese say, Captain Cook had a heavy tongue and misspoke our words. Interestingly, a very similar story for the naming of Epi Island is told by the Bierebo language speakers there too – that when Cook came ashore he was given yams and enquired about their names – and mispronouncing their reply, yupi, as epi.4

There is a small problem to these wonderful stories – Captain Cook never actually set foot on Ambrym or Epi and merely sailed past. Of course, this does not mean that similar exchange of yams and oranges did not happen, but that maybe it was a different European navigator or missionary.5

So, if not named after an exchange of yams, where does the name Ambrym come from? Captain Cook sailed past Ambrym and onto Malekula Island where he went ashore at Port Sandwich (named by Cook after the Earl of Sandwich). There, the indigenous group who speak Port Sandwich, or Lamap as it is known endonymically after the place it is spoken, told Cook the names of the surrounding islands, Ambrym being one of them. So Ambrym is actually an exonymic language name. I believe the name Ambrym itself derives in part from the word meaning fire in the Port Sandwich language, gamb [ɣaᵐb], and in many other Malekula languages, simply amb. Though unfortunately I haven’t been able to figure out what the second part of name – rim – means.

What has Ambrym and fire got to do with anything? In the traditional mythology of several of the culture groups of eastern Malekula, especially on the small islands of Atchin, Vao and Wala off the eastern coast, the souls of the dead would be ferried across to Ambrym and then climb the volcano, the land of the dead, to spend their afterlife.6

The twin volcanoes of Ambrym are highly visible in the night sky, giving a rather other-wordly sight. As seen from the Maskelyne islands, off the southern coast of Malekula.

Word, Speech & Language

Nowadays, the languages of Ambrym are shedding their exonymic names and reclaiming their endonymic names. The endonymic language names of Ambrym Island nearly all are related to the meaning ‘word, speech, language’ along with a demonstrative such as ‘here’ or ‘of this place’: Rral (North Ambrym), Daakie, Daakaka, Dalkalaen, Raljako, Raljaja and Vatlongos. But one smaller language also spoken in Ambrym– Fanbak is still a place name, meaning ‘under the banyan tree’.

This is itǃ

Finally, the two languages of northern Ambrym – North Ambrym, which has two dialects, and Fanbak are often referred by speakers using an expression meaning ‘this is it’ or ‘here it is’. The two dialects of North Ambrym are referred to as Ngeli and Ngeye, whereas Fanbak is called Ngelē. Again, these are linguistic identifiers, similar to the words for ‘what’ in the Pentecost languages, or the terms of suprise used for the languages in Malekula.

There may be over 130 languages in Vanuatu, but there are certainly even more names for them!

The linguistic archaeology of feet

There’s been excitement recently about evidence that humans had set foot in the Americas as much as 22,500 years ago, pushing back the previous best estimate by almost ten thousand years. And by ‘set foot’, I mean literally. The tell-tale new evidence comes to us in the form of imprints left by human feet in a particularly well-preserved mudflat in New Mexico. So far, the humans themselves have not been uncovered by archaeologists, but their characteristic mark upon the mud has endured.

When linguists peer into the past, we also will occasionally use the imprints, left by something which has otherwise been lost, to infer its presence long ago — all of which brings us to the topic of feet, and not the kind that you’d use to walk across a mudflat, but the literal English word ‘feet’, which itself contains a wonderful imprint of a long-lost vowel.

Our story begins with the fact that in English, the word ‘feet’ is a little odd. It’s a plural that doesn’t end in ‘s’. As any child will tell you, you can’t get away with saying ‘foots’ for the plural of ‘foot’ for very long before someone bigger than you corrects it to ‘feet’. However, given that most English nouns do use an ‘s’ plural, it’s entirely sensible to ask why ‘feet’ is different. (Of course, ‘feet’ isn’t absolutely unique: English contains a select club of other, similar plurals like ‘geese’ and ‘teeth’, to which we’ll return in a minute.)

The tale of ‘feet’ begins around two millennia ago, when it was in fact a regular plural word. In proto-Germanic, the singular form would have been ‘fōt-s’ (pronounced approximately as fohts, where ‘ō’ is a long ‘o’ sound) and its corresponding plural ‘fōt-iz’, constructed with a simple plural suffix ‘-iz’. Over the following centuries, the sounds at the end of the plural form were worn away and eventually lost, as often happens during language change. However, before the suffix disappeared entirely, the ‘i’ vowel in it left its imprint on the ‘ō’ vowel, changing it to ‘ȫ’, which is to say ‘fōtiz’ became ‘fōti’ then ‘fȫti’ then ‘fȫt’ which by Old English had become ‘fēt’ and is now ‘feet’. In the meantime, the singular form ‘fōts’, which contained no ‘i’ vowel, changed very little indeed: it lost its suffix ‘-s’, becoming ‘fōt’ and then modern English ‘foot’. A similar story lies behind the plurals ‘geese’ and ‘teeth’: an original suffixal vowel ‘i’ changed ‘ō’ into ‘ȫ’, before disappearing, then ‘ȫ’ became ‘ē’.

You might say that the ‘i’ vowel left its imprint upon original ‘ō’ in the form of the altered vowel ‘ȫ’. One tool which linguistic archaeologists put to good use, is our knowledge of the characteristic imprints that one sound can leave upon another. In the case of the long-lost ‘i’ vowel, the imprint even has a name, umlaut. Historical umlaut is also what lies behind plurals like ‘mice’ and ‘men’.

Armed with the background knowledge that lost ‘i’ vowels changed ‘ō’ into ‘ȫ’, and in doing so gave rise to modern English alternations between ‘oo’ and ‘ee’, we can now go fossicking through the vocabulary for more lost ‘i’ vowels. Another suffix that was lost over the centuries was a causative suffix, which related nouns to verbs, such as ‘blood’ to ‘bleed’, or ‘food’ to ‘feed’: as you’ll have guessed, the verbs once contained a now-lost ‘i’. In some cases, pairs of sibling words such as these have grown apart over time. For instance, if you were to decide someone’s fate (or their ‘doom’) then you’d be judging them (or ‘deeming’ them), though as you can see, I had to produce a fairly contrived context to highlight the relatedness of ‘doom’ and ‘deem’.

Umlaut caused by a now-lost ‘i’ also crops up in several nouns ending in ‘-th’: compare not only ‘strong’ with ‘strength’, ‘long’ with ‘length’, or ‘broad’ with ‘breadth’, but also ‘hale’ with ‘health’ and ‘foul’ with ‘filth’.

feet made filthy by umlaut!

Over decades of meticulous work, linguists have uncovered much about how languages around the world change over time, though much more still remains to be accounted for. One of the many lingering questions is what the conditions are, which favour the continued survival of idiosyncratic word forms like ‘feet’, long after they have lost their regularity. We know that many irregular words, such as the Old English plural ‘bēc’ for ‘books’ (corresponding to singular ‘bōc’), get removed over time, yet others persist for millennia. It’s an ongoing task for linguists to understand why some footprints remain while others get washed away.

Isn’t it iconic? creating signs in sign languages

If I asked you what you think of when I say the word iconic, you most probably would name David Bowie, Big Ben, or fish and chips. That is if you are not a linguist. We use this word in a different sense. It refers to elements in a language that have some sort of resemblance to the thing they refer to in the real world. The form of a word is not completely random. If you think about it, the adjective iconic is related to the noun icon, which, in its original meaning, denotes a painting that resembles a holy figure. Out of all the languages in the world, sign languages are especially famous for having a lot of iconic elements. Let’s see how it works!

Perhaps, the most often cited type of iconicity is word-level iconicity. Basically, it refers to signs that look like what they mean. Take a look at the Russian Sign Language sign CANDLE.

The sign CANDLE in Russian Sign Language.

Here, the signer ‘‘makes a picture’’ of a candle with his hands: his left hand, bent into a fist, stands for the body of a candle, and his right hand imitates flames by slightly shaking on top of it. At first glance, the idea is very simple. You might even wonder why linguists would spend time researching this phenomenon. Here is a candle, and here is an objective and logical way to depict this candle with the hands. But the process is more complex than it appears. First, note that not all candles look the same. Some of them have very thin bodies (like birthday candles), others are flat (like tealights), and don’t forget sophisticated arty candles like the ones below in the shape of Halloween characters.

Halloween candles.

This means that the Russian Sign Language sign CANDLE doesn’t depict some kind of objective candle. Instead, it portrays the picture of a candle it considers prototypical. This already can add quite a lot of variation: we can safely assume that there would be sign languages that choose a different candle to depict. Indeed, Italian Sign Language has a taller, more elegant looking candle in mind. Notice how the signer draws its tall body in the beginning.

The sign CANDLE in Italian Sign Language.

But even if two or more languages have the same picture in mind, there are still a lot of different ways to express it. For example, in German Sign Language, you are supposed to imitate lighting a match (that would in turn light the candle).

The sign CANDLE in German Sign Language.

Whereas in Greek Sign Language, you would show blowing out a candle instead.

The sign CANDLE in Greek Sign Language.

And even if you choose to express the same aspects of the same picture, you can still do it differently. For example, Brazilian Sign Language uses the same imagery as Russian Sign Language, but it shows the flames of the candle with all five fingers instead of just three.

The sign CANDLE in Brazilian Sign Language.

In order to account for this wide variability, Sarah Taub came up with a neat model of iconic signs. According to her, the creation of an iconic sign happens in three steps: (1) image selection: choosing an appropriate image; (2) schematization: choosing the important parts of the image to represent; and (3) encoding: creating the form of the sign. During the first step, one selects a prototypical image to represent; then, during the second step, one chooses what elements of this image will be expressed by the sign, and what elements will be left out. Finally, the last step is to decide how these elements will be expressed, i.e., what handshapes will be used and how they will be joined together. Sarah Taub explains this model on the example of the American Sign Language sign TREE.

The sign TREE in American Sign Language.

Here, one starts by choosing what tree species to represent and what kind of information to encode, such as tactile images of how bark and leaves feel, auditory images of leaves rustling, or visual images of a tree shape and/or colour. In case of American Sign Language, the choice fell on the shape of a tree with a tall trunk and a leafy treetop. Then one creates a mental representation of a tree to decide what pieces of it will be encoded. American Sign Language selected the trunk of a tree, the branching treetop, and the ground in which the tree grows. And finally, one needs to choose a physical form to represent each piece. In this example, a spread hand represents the branching structure, an upright forearm represents the trunk, and a horizontal forearm and palm represent a flat surface.

Sarah Taub’s model of iconic signs.

Try this yourself! Can you come up with a sign for, say, a flower? Think of the flowers you know and choose one! Will it be a dandelion, a сamomile, a rose, or maybe a funny (and slightly scary) monkey orchid?

Monkey orchids.

Then think of pieces you want to represent. Will it be just the flower itself? How many petals? How big are they? Do you want to encode the stem and the leaves as well? Or maybe your flower has thorns? And what about the soil? Finally, play with your hands or maybe even with your whole body and find a way to encode these pieces.

When you have created your masterpiece, go ahead and check how different sign languages did it! The best place to go is the spread-the-sign website. You can just type the word ‘flower’ and click on the flag of a language you are interested in. Of course, the difference between your representation and a sign of a sign language will be that you are free to choose from whatever parts and positions of your body you can come up with, whereas sign languages are limited by handshapes and movements that exist in the language. However, you’ll still get a good taste of iconicity!

SMG – I’d Arapaho, Roon, Sala, Tubar and Nara, but alas no Oroha paradigms

A palindrome is a linguistic delight: it reads the same in both directions. For example: level. Or Anna, or indeed Hannah. This is a visual trick: if you record yourself saying one of these words and play the recording backwards, it won’t sound exactly the same.

Palindromes hit the big time in the parrot sketch. They were also promoted by ABBA, with their top hit SOS!

Here’s a nice one from North Ambrym (an Oceanic language spoken in Vanuatu): rrirrirr ‘sound a rat makes when you try and kill it but you miss it’. And a long one from Estonian: kuulilennuteetunneliluuk ‘bullet flying trajectory tunnel’s hatch’. I’m not sure that one is used much (except in blogs about palindromes).

We can go up a level (!), as it were, to palindromic phrases. A famous one of these is:

A man, a plan, a canal – Panama!

This has been around at least since 1948. It has often been extended, as in this version due to Guy Jacobson:

A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!

And here’s a Russian sentence palindrome: Рислинг сгнил, сир. ‘the Riesling has gone off, sir’ More Russian palindromes at For French sentence palindromes go to And there are even songs based on such palindromes:

They have palindromes in American Sign Language:

Not surprisingly, palindromes don’t translate. Though we can go up another level (!) of cleverness, to the bilingual palindrome: I love / e voli. This is half English, half Italian, and overall a palindrome. More of these at It’s truly amazing what people can create, including whole poems as palindromes:

Some time ago, I mentioned to linguist colleagues that Malayalam (a Dravidian language of southern India) is a palindromic language. One colleague’s eyes opened wide, and he asked whether it was palindromic at the word level or the sentence level. What a great idea! Of course, it’s just the name which is a palindrome (just as Anna is a palindrome but that doesn’t make Anna a palindromic person – there are deep issues here: what does a name refer to?).

It turns out that there are over seventy “palindromic languages”, including some that are central to our research in SMG, notably Iaai (spoken in New Caledonia). Here are some more: Efe, Ewe, and Atta.

What then of E (also called Wuse/Wusehua), a Tai-Chinese mixed language, of Guangxi, China? Yes, it’s a palindrome, just not a very impressive one. Just as the English pronoun I is a palindrome, though hardly one to get excited about (unless you’re called Anna or Hannah of course). But it gets much better. You may have noticed that linguists increasingly give three letter codes after language names. These are the ISO codes that we use to uniquely identify a language, to make sure that we’re talking about, say, the language Aja (a Nilo-Saharan language of Sudan), ISO code aja, and not Aja (a Niger-Congo language of Benin), ISO code ijg. So, what is the ISO code for the language E? It’s eee. The language name and the code are both palindromes! Similarly there’s U (an Austroasiatic language of the Yunnan Province of China), ISO code uuu.

Here are the languages which are doubly palindromic (name and ISO code):

Name ISO code
E eee
Efe efe
Ewe ewe
Iaai iai
Kerek krk
Naman lzl
Mam mam
Nen nqn
Ofo ofo
Ososo oso
Utu utu
U uuu
Yoy yoy

A real star is Naman, whose ISO code is quite different, lzl, but still palindromic. Where does that come from? Well, the language has an alternative name, Litzlitz, so when it’s not a palindrome it’s a reduplication!

Back to the tricky use of “palindromic language”. Iaai is a palindromic name. As we’ve seen, its ISO code iai is also a palindrome. And the language does have some very nice palindromes:

  • aba ‘caress’
  • ee ‘locative – near the interlocuter’
  • ii ‘to suck’
  • iei ‘to hurt, cause pain’
  • ikiiki ‘repugnant’
  • iwi ‘rudder’
  • komok ‘sick’
  • maam ‘your manner’
  • mem ‘Napolean fish (Cheilinus undulatus)’
  • omoomo ‘women’
  • nokon ‘his/her infant’
  • oṇo ‘Barracuda (Sphyraena sp.)’
  • öö ‘spear’
  • ölö ‘mount, embark, disembark’
  • ölö ‘legume (Pueraria sp.)’
  • u ‘an old word for yam’
  • uu ‘fall from a height, chop down (of tree)’
  • ûû ‘a dispute, to dispute’
  • ûcû ‘similar, same’ (a nice meaning for a palindrome!)
  • ûcû ‘to exchange, buy, shop’

It would be impressive if you could read this post backwards, and have it make sense. But that wouldn’t be a BLOG but a GLOB, the latter being is an instance of a Semordnilap, but that is another story. For now, we welcome your favourite palindromes, in any language, in the comments.

For examples, thanks to Jenny Audring, Sacha Beniamine, Marina Chumakina, Mike Franjieh, Erich Round and Anna Thornton, and for the title (you’ve guessed what sort of title that is!), thanks to Steven Kaye.

Sign language mythbusters

We have all heard of sign languages. Most of us have seen people talking to each other using their hands and body movements instead of the voice: on the street, at a train station, or in a noisy café. We probably even felt a slight jolt of envy, thinking about how much easier it must be for them to communicate, when they are surrounded by loud music, laughter, and chatter. Curiously, however, very few people know what sign languages actually are. Unless you are a sign language user and/or a linguist, you probably have a lot of misconceptions about their nature. For this reason, linguists who write about sign languages, often begin their books with a discussion of myths and misconceptions. For example, Robin Battinson wrote a section on misconceptions about ASL, Trevor Johnston and Adam Schembri covered the same topic on the data of Australian Sign Language, Vadim Kimmelman and Svetlana Burkova discussed common mistakes in light of Russian Sign Language. Let us follow their example and bust a few myths!

Myth №1 There is only one sign language

Perhaps, the most mind-blowing thing about sign languages is that there is more than one. Indeed, if we never encountered sign languages in action, we most probably have a default assumption that there is one sign language, and everyone is using it. Why would you need more? Surely, at some point, someone came up with a list of signs for different objects and actions, and now all deaf and hard-of-hearing people use them.

“That Deaf Guy” comic by Matt & Kay Daigle

This is not true. Nowadays, we know about not one, not even ten, but one hundred and seventy different sign languages spread around the world. And it is very possible that there are other sign languages we are not even yet aware of. Check out the map from Glottolog, that provides a catalogue of the world’s languages:

Sign languages of the world

Each dot in this map represents a language with its own vocabulary and grammatical structure. The yellow dots are sign languages that developed in urban settings. The blue dots are so-called ‘rural’ sign languages that appeared in small village communities with a high rate of hereditary deafness. Finally, the rare red dots are ‘secondary sign languages’. These languages developed in hearing societies as a substitute for spoken languages in certain situations.

Yes, 170 sign languages is a much more modest amount than roughly 6500 spoken languages, but it is definitely more than one. Now, let’s reflect on what sign languages actually are.

Myth №2 Sign languages are a kind of pantomime

Who likes Charades? In this classic team game, you need to enact a title of a book or a movie without saying a single word. Some of these titles can be quite tricky. Have you ever tried to mime “Star Wars Episode V: The Empire Strikes Back”? So, we put forward our best improvisation techniques and we create quite complicated sequences of body movements in order to express the idea we need.

Sign languages do the same thing, don’t they? They express different ideas with movements of the hands and other parts of the body. So, maybe sign languages and pantomime are in fact the same thing? Well, no, not really. You see, one very important feature of a pantomime is transparency. We are usually able to guess what is going on without anyone translating it for us. Sign languages are not so generous. Try to make sense of this short video in Russian Sign Language. I can even give you a hint: the title of this video is ‘Miracles of dog training’.

A short story ‘Miracles of dog training’ in Russian Sign Language

If you are not familiar with Russian Sign Language, you probably didn’t understand that an unlucky man, the main character of this tale, tried to teach his dog to bring him a stick. The dog didn’t quite grasp the concept and instead started bringing him umbrellas, which it would steal from unsuspecting passers-by.

Why is it so hard to understand a sign language? Let me answer this with a counterquestion: why we would expect it to be easy? Well, this assumption stems from the phenomenon called ‘iconicity’. A lot of signs in sign languages look like what they describe. For example, if you watch the video about the dog training again, you will easily find a sign for ‘holding a stick in a mouth’. A tricky thing about iconicity, however, is that it is evident once you know what the sign means. But can you guess a meaning of an iconic sign? Let’s give it a go! Here is a sign in Russian Sign Language. Can you guess what it means?

An iconic sign in Russian Sign Language

If you are done guessing, here is the answer. This sign means ‘empty’. Once we know this, it seems obvious that a person in this video imitates looking for something in an empty bag. But it is really hard to guess it beforehand.

Another reason for the non-transparency of sign languages is that, unlike pantomime improvised on the spot, sign languages have quite complex rules for forming sentences. Speaking of sentences, let’s bust another widespread myth that has to do with sign language structure.

Myth №3 Sign languages are spoken languages articulated with hands

Many people assume that sign languages are not independent languages, but instead are signed versions of spoken languages. For example, British, American and Australian Sign Languages are signed versions of English, French Sign Language is a version of French, Russian Sign Language is a version of Russian, and so on. From this point of view, if someone wanted to express a sentence in English with something other than their voice, they could write it down or sign in instead.

However, this is not the case. Many aspects of sign languages are completely unrelated to spoken languages that surround them. Trevor Johnston and Adam Schembri provide a good illustration of this using Australian Sign Language as an example. The English word light has several meanings, such as ‘not heavy’ (as in a light bag), ‘pale’ (as in a light colour), or ‘energy from the sun or lamp that allows us to see things’ (as in turn on the light). Although in English all these meanings are expressed with the same word, they would be translated to Australian Sign Language with three different signs.

Australian Sign Language translations for the English word “light”

Of course, this is not the only kind of difference between sign and spoken languages. Grammars are different too. Sign languages do not have articles, such as a and the in English, or case marking, like Russian Genitive or Dative. They don’t mark plurality and past tense with special endings. Instead, they have their own ways to express time and quantity related information. Many of them revolve around iconicity. But this is a topic for a different post. Stay tuned!

Are words all different? Or are they all the same?

Imagine we have less than a life-time to describe the words of a given language. We might start from the view that each individual word is a treasure to be described in exquisite detail. Indeed, it is one of the achievements of our field that linguists have found and described gems like the following:

  • Archi (Dagestan) has the word t’uq’ˤ. which is a stone post inside an underground sheepfold, which supports the stone roof.
In Archi the t’uq’ˤ is the stone posts supporting the roof of a sheepfold (Photo credit: Dr. Marina Chumakina).
  • Soq (Papua New Guinea) has the verb s- ‘stay’, which is anti-irregular. While typical irregular verbs (like English go ∼ went) have unexpected forms but mean ‘the right thing’ (went means ‘go in the past’), the Soq verb s- ‘stay’ is the opposite of that. Its forms are unremarkable, but uniquely in the language, its present tense covers the time period of the English present (‘now’). All other verbs have a present tense (sometimes called ‘hodiernal’) which covers the period starting at nightfall yesterday and running through to and including ‘now’.
  • Krongo (Sudan) has the noun m-ùsí ‘sorcerer’, where the initial m- tells us it is singular. The plural is nú-kù-kk-ùs-óoní ‘sorcerers’ with no less than four plural markers, each of which is found independently with other nouns.
  • Russian skovorodá ‘frying pan’ seems remarkable only in that you have to wait for the last syllable to put the stress on the word. But in the plural, the stress moves forward three syllables: skóvorody ‘frying pans’, which makes it sound rather different.
  • English dust. Yes, even English has some star items. The humble verb dust is an example of ‘Gegensinn’, that is, it means its own opposite. We can dust a cake with icing sugar (that is, putting on particles), the opposite of dusting the furniture (removing particles).
    Dusting – even elephants love to do it!

    But there is a danger with this approach: we may well manage a few hundred items, and leave behind an unpublished dictionary. Or we may publish Volumes I-III (A-F), leaving the user stuck for words later in the alphabet: this happens particularly with larger projects, when grand intentions meet organizational and financial reality.

    The alternative approach is to start from the assumption that all words in a language are the same. We soon discover, of course, that this is not quite true. There are dramatic generalizations to be made: we may find, for instance, that many words can occur alone, and some cannot. More generally different classes of words have different properties of combination with others. That is, we specify part of speech information (verb, noun, and so on). Consistent with this, wholly or partly, we may find regularities such as some words distinguishing tense while others do not. And real dictionaries embody such regularities as defaults. If an English dictionary specifies that compute is a verb, it is taken as given that it will have a past tense, that the form will be computed, that this past tense will be compositional (we know that what it means is a combination of the lexical meaning of compute and the grammatical meaning of PAST). And when a default is overridden, the information is given in the dictionary entry. For example, the past tense of go is went (and only the form need be given, since our default assumption about what it means will hold good), or that binoculars is a noun but lacks a singular.

    I have described not one, but two straw men, though I have met real people who came close to these extremes. The point is that the interest of the linguistic gems we started with comes precisely from the way in which they stand out against the backdrop of the general picture. We know that there are general defaults – otherwise speakers and hearers would not cope. We expect singular and plural of a noun to be linked by a simple formula, rather than by a stress-shift that dramatically changes the way the noun sounds, as with Russian skovoroda ‘frying pan’. So in principle we can start from either end (words are all different or words are all the same), so long as we have the other horizon in view too.

    Don’t forget to destress when using a frying pan in Russian… But if you can’t take the heat, time to get out of the kitchen!

    Of course, real people tend to feel more comfortable working from one end or the other; lexicographers are, arguably, more interested in the differences and linguists more in the generalizations. And there are important movements within the field where dictionary-makers point out the need for much more detailed grammatical information about individual words, and conversely where linguists point out that the broad classes we often work with need to be broken down into rather finer detail.

    A saving grace in all this is the possibilities offered by online dictionaries. We can present some of the richness of words in new ways. For example, rather than trying to describe what the pillar that holds up the roof of an underground sheep fold looks like, we can give a picture. The online Archi dictionary does this. And it provides the sound file, so that users can hear what the word sounds like. Indeed they can hear all the basic forms needed to derive its large array of forms (its extensive paradigm). What if the system of sounds comprising the words has taken years of work to unravel? We want to hear the sounds and see the system. This is something – among other good things – that the new Nuer dictionary offers.

    Browsing in the Archi and Nuer dictionaries makes us marvel at how different those words are, one from another, and perhaps from ‘our’ words. And yet they are all the same too – they all use the same Archi and Nuer systems of sounds, and they fall into parts of speech which are interestingly comparable to ‘our’ parts of speech (verbs and nouns are distinct, and so on).

    It would need several lifetimes for anything approaching a ‘complete’ dictionary of Archi or Nuer. But there are plenty of surprises whichever perspective we take: the dictionary entries tell us about the amazing differences between languages, but the innocent little markers (like v. and n.), and the sets of forms given, point to the equally amazing sameness.

    If you enjoyed this post, why not check out our favourite untranslatable words from the languages we work on.

A “let’s circle back” guy

As everyone knows by now, for the foreseeable future we must all stay at home as much as possible to slow the spread of COVID-19 and reduce the burden on our health services – which has already been substantial, and will soon be enormous even in the best possible scenario.

This shift in the way we operate as a society will have a wide range of effects on our lives, which are already being noticed. Some of these were the kind of thing you might have thought of in advance – but others less so. For example, soon after the advice to work from home really started to bite in the US, a substantial thread developed on Twitter, all started off by the following tweet:

The thousands of responses that appeared within a few hours of this tweet shows how deeply it resonated: many people must have been through their own version of the same surprising experience, some of them presumably in the last few days. But what happened here, and why was it so surprising? And why, as a linguist, am I sitting at home and writing a blog post about it now?

This single tweet, which people found so easy to identify with, in fact brings together a number of issues that linguists are interested in. For one thing, it works as a clear illustration of a point that people intuitively appreciate, but which has endless ramifications: the language you use is never just an instrument for communicating your thoughts, but is also taken to say something important about your identity, whether you intend it to or not. If a guy uses the expression “let’s circle back”, meaning to return to an issue later, that makes him a “let’s circle back” guy – that is, a particular kind of person. In a jokey way, the tweeter is implying that she already had a mental category of ‘the kind of person who would say things like that’, and she takes it for granted that we do too. In this case, the surprise for Laura Norkin was in suddenly discovering that her own husband belonged in that pre-existing category: the way she tells it, hearing him use a specific turn of phrase counted as finding out important new information about who he is as a person, which she was not necessarily best pleased about.

Making a linguistic choice: a bilingual road sign in Wales

Since the mid-twentieth century, the field of sociolinguistics has drawn attention to the fact that this kind of thing is going on everywhere in language. Consciously or unconsciously, people are making linguistic choices all the time – whether that means choosing between two totally different languages, between two different expressions with the same meaning (do you circle back to something or just return to it?), or between two very slightly different pronunciations of the same word. Any of these choices might turn out to ‘say something’ about how you see yourself – or how other people see you. And the social meanings and values assigned to the different choices are likely to change over time: so understanding what is going on with one person’s use of language really requires you to understand what is going on right across the community, which is like an ecosystem full of co-existing language diversity. How do linguistic developments, and the social responses to them, propagate and interact in this ecosystem? That’s something that researchers work hard to find out.

The tweet also picks up on the importance of the situational context for the way people use language. Laura Norkin had never heard her husband use the offending expression before because it belongs to a particular register – meaning a variety of language which is characteristic of a particular sphere of activity. Circling back is characteristic of ‘full work mode’, something which had never previously needed to surface in the domestic setting.

Why do registers exist? Partly it must be to do with the fact that different people know different things: for example, lawyers can expect to be able to use technical legal terminology with their colleagues, but not with their clients, even if they are talking about all the same issues – because behind the terminology there lies a wealth of specialist knowledge. Similarly, anyone would modify their language when talking to a five-year-old as opposed to a fifty-year-old.

But this cannot be the whole story: it doesn’t help you to explain the difference between returning and circling back. Should we think of the business/marketing/management world, where terms like circling back are stereotypically used, as a mini community within the community, with its own ideas of what counts as normal linguistic practice? Or is everyone involved giving a signal that they take on a new, businesslike identity when they turn up to the office – even if these days that doesn’t involve leaving the house? Again, working out the relationship between the language aspect and the social aspect here makes an interesting challenge for linguistics.

The medical profession is well known for having its own technical register

But this was not just an anecdote about how unusual it is to be at home and yet hear terms that usually turn up at work. We can tell that “let’s circle back”, just like other commonly mocked corporate expressions such as “blue-sky thinking” or “push the envelope”, is something we are expected to dislike – but why? The existence of different registers is not generally thought of as a bad thing in itself. You could give the answer that this expression is overused, a cliché, and thus sounds ugly. But really, things must be the other way round: English abounds in commonly used expressions, and only the ones that ‘sound ugly’ get labelled as overused clichés. And there is nothing inherently worse about circle back than about re-turn – in fact, when you think about it, they are just minor variations on the same metaphor.

So what is really going on here? The popular reaction to circle back, and other things of that kind, seems to involve lots of factors at once. The expression is new enough that people still notice it; but it is not unusual enough to sound novel or imaginative. It is currently restricted to a particular kind of professional setting that most people never find themselves in; but it does not refer to a complex or specific enough concept to ‘deserve’ to exist as a technical term. And we do not tend to worry too much about making fun of the linguistic habits of people who have a relatively privileged position in society: certainly, teasing your husband by outing him as a “let’s circle back” guy is not really going to do him any harm.

Spelling it out like this helps to suggest just how much information we are factoring in whenever we react to the linguistic behaviour of the people around us – and this is something we do all the time, mostly without even noticing. We are social beings, and cannot help looking for the social message in the things people say, as well as the literal message: establishing this fact, and working out how to investigate it scientifically, has been one of the great overarching projects of modern linguistics. Right now, for everyone’s benefit, we need to learn how to be less sociable than ever. But as the tweet above suggests, people’s inbuilt sensitivity to language as a social code is not going to change any time soon.

Arabic based scripts

Scripts spread like bad news. Look at the Latin script, which is the ultimate winner considering the hundreds, if not thousands of languages that use it today. Political power and religion have caused the Latin script to serve as the basis for this proliferation of written languages, first in Europe, and then almost everywhere else, including many languages that had no written tradition before the Western influence. The exceptions are the scripts that have a strong enough tradition that keeps them going.

However, the Latin script is not the only prevalent one. Wikipedia lists 95 languages that are using, or have actively used the Arabic script. In this post we will be looking at how they do it.

The way different languages use a script can vary significantly. Some can invent new versions of letters that express the peculiar sounds of a language, such as the long vowels in Hungarian: á, í, é, ó, ú, ő, ű. Others, like English, combine existing letters to do the same job, like th or ch. Some will get rid of the letters that are not useful enough. Next time you visit Turkey, look at the taxi signs.

A Turkish taksi

One way we could classify writing systems is how helpful they are, if someone intends to read them. Chinese is famously not very helpful. Even though some characters will give a hint on how to pronounce the word, or what it means, generally you have to learn thousands of characters, that refer to separate “words”. English is rather helpful in the sense that the letters generally help the reader figure out what sound is supposed to be pronounced. Not always, thouGH. Sometimes it is touGH to determine how to pronounce GH, for example. Is it /f/, /g/ or /nothing/? Learners have to learn the differences individually. The most helpful scripts represent a speech sound with a single letter consistently. Look at Turkish! Nobody needs an X if you have KS, that perfectly does the job at all times.

Arabic is similar to English in this classification, but in a completely different way. In order to understand what is going on, we must know what templatic morphology is. When creating new words, most languages add meaningful bits to the beginning, or to the end of a word. Or both, like in the case of my favorite Metallica song, the Un-forgive-n. We can say that English, in most cases, uses a word as a base for such operations. Arabic, on the other hand, uses two or three consonants, as a base. They are not words; they rather represent a broad concept. The schoolbook example is K-T-B, which represents the broad concept of writing. Arabic, then, adds things before, after and in between (i.e. applies the three consonants to a template). The templates also have meanings and thus narrow down the concept’s meaning to a word, that can actually be used in the language. There are only two rules when inserting the three consonants into a template: 1) Do not skip any consonant, and 2) keep their order. Let’s see a few examples, how these templates work. The capital letters are the base consonants, and the small letters fill in the template.

Template meaning K-T-B ‘write’ M-L-K ‘rule, possess’
place where happens maKTaBa ‘library’ maMLaKa ‘kingdom’
person who does it KāTiB ‘writer’ MāLiK ‘king’
passive (being done) maKTūB ‘written’ maMLūK ‘slave’

Long story short, templates are extremely important in Arabic. This is combined with the unfortunate fact that Arabic has lots of consonants and very few vowels, namely, /a/, /u/, and /i/. They all contrast long and short versions, that gives a total of six vowels. On the contrary, there are 28 consonants. Here is a really nice introduction to Arabic speech sounds.

The facts above have led to a writing system where vowels are so ‘underrated’ that they are basically not marked. In fact, the long vowels are marked, but by specific consonants, that may be pronounced as a consonant, or considered as a sign that marks a long vowel. To illustrate this, let’s see some Arabic words, the raw information you get from the letters you see, some possible pronunciations, just for fun, and how you actually need to pronounce them.

raw information [m] [w/ū] [r] [d]
possible pronunciation mawarad, mūrad, mawrad, miward, muwarrid, muwarad…
actual pronunciation mawrid
meaning supplies

raw information [m] [d] [y/ī] [n] [a]
possible pronunciation midayna, mudayna, madayna, mudīna, midīna, madīna…
actual pronunciation madīna
meaning city

Arabic has a way of signaling how a word should be pronounced exactly, but these additional signs above and below the main letters (diacritics) are only used in children’s reading books and in the Qur’ān. Nothing above and below the red lines actually appear in every-day texts or in handwriting.

Arabic script

In essence, instead of marking vowels with high precision, Arabic marks the consonants and in most cases, you can figure out the template as well. And if you know Arabic, then you know all the templates, so you don’t even really need those unmarked vowels.

The Arabic writing system fits the Arabic language really neatly, but what about other languages? Persian uses the Arabic script, but it has no templates. It is an Indo-European language with word formation rules that are very similar to the ones we find in European languages. So, how did they deal with this situation? Well, they did their best to mark vowels with a bit more precision. At the ends of words, Persian uses the letter /h/ to mark the vowels /e/ and /a/. The consonants that can signal the presence of a long consonant in Arabic, are used much more consistently, so when you see one, you can be almost sure that there is a long vowel. Apart from the vowel problem, Persian has also added a couple of consonants, that Arabic lacks, such as /p/, /g/ or /ch/.

Urdu is spoken mainly in Pakistan, and it is quite similar to Hindi, but let’s stick to the fact (there is a political debate), that it has retroflex consonants (the tip of the tongue curls backwards). Those are the speech sounds in many Indic languages that make them sound so recognizable. Urdu’s strategy is similar to what we saw in Persian, with the addition of the retroflex consonant. There is also an additional, second form of the letter h, that signals aspiration (the h-like sound after consonants, like in the words dharma, makhani or bhaji). The last addition is a differently shaped letter y, that marks /ay/ or /ey/, as opposed to a long /ī/. In Persian and Arabic, there is only one letter that represents these three sounds.

Urdu is also special in that the Urdu printed texts use a type of calligraphy, called Nasta’liq. This makes Urdu texts look very different from Arabic, but it is only a matter of fonts.

Arabic newspaper
Urdu newspaper

Lastly, let’s discuss a language that has completely reformed the Arabic script. Uyghur is a Turkic language spoken in the Xinjiang Uyghur Autonomous Region in Northwest China. As all Turkic languages, Uyghur has a large number of vowels and relatively few consonants. This makes the Arabic script a rather difficult choice for this language, unless some modifications are done. In the Uyghur script, every speech sound is represented in a consistent way, i.e. there is no ambiguity whatsoever. The set of consonants is essentially the same as in Persian, but there are nine additional letters that allow for a precise marking of vowels. For anybody else from the world of Arabic based scripts, the resulting text may appear somewhat weird. The following image illustrates how different this script is from the previous ones. The parts circled are the Uyghur innovations that would be incorrect in Arabic, Persian or in Urdu. Notice their proportion.

Uyghur script

The cherry on the cake is the Thaana script. It is used to write Dhivehi, an Indo-European language spoken in the Maldives. This script is based on Arabic, but in a unique way. Thaana started off as a secret script for sacred, religious texts. It was considered a way of encryption, and therefore the letters originate from Arabic letters, as well as Arabic numbers and Indic numbers (!). Imagine that you code a message that looks like this: 7q۳۶gt55۹۴. All speech sounds are precisely marked, as in Uyghur. Notice the vowel-marking diacritics above and below the main letters, and their similarity to the Arabic diacritics (in the picture above where the diacritics are separated with a red line). But of course, this script looks really different from the other ones we have seen.

Dhivehi newspaper

Linguists believe that only a handful of writing systems appeared independently around the world. Most languages had to adopt the script of another language, and due to different needs and strategies, we have ended up with a myriad of historically related, but still, different scripts. Linguists consider writing systems negligible, since they are just the representation of language, which we are truly interested in. I think, however, that the backgrounds of different scripts are amazing.

Eggcorns and mondegreens: a feast of misunderstandings

Have you ever felt that you needed to nip something in the butt, or had the misfortune to witness a damp squid? And what can Jimi Hendrix, Bon Jovi and Freddie Mercury tell us about language change?

Well, if you know Hendrix’s classic “Purple Haze”, you surely remember the moment where he interrupts his train of thought with the unexpected request, ‘Scuse me while I kiss this guy. Or perhaps you recall “Living on a Prayer”, where we hear that apparently It doesn’t make a difference if we’re naked or not. And who can forget the revelation, in “Bohemian Rhapsody”, that Beelzebub has a devil for a sideboard?

Wise words from Celine Dion

If you do remember these lyrics fondly, you are not alone – lots of people are familiar with these exact lines. There is just one problem, of course: none of those songs really say those things. Instead, the lyrics involved are ‘Scuse me while I kiss the sky; It doesn’t make a difference if we make it or not; and Beelzebub has a devil put aside for me. And yet thousands of English speakers the world over have had the experience of listening to “Purple Haze” and the others – and of misunderstanding the words, entirely independently, in exactly the same way.

Mishearings of this kind are common enough that they have been given a name of their own, mondegreens – a word invented by the American writer Sylvia Wright, who as a child heard a poem containing the following lines:

For they hae slain the Earl o’ Moray
And laid him on the green

and assumed that it listed not one but two victims – the unfortunate Earl himself, and “Lady Mondegreen”, a plausible character who happens not to feature in the real poem.

Why does this kind of thing happen? One reason has to do with the nature of spoken language. On the page, English sentences come pre-packaged into words, each of which is made up of distinct, easily-identified letters which look pretty much the same every time. But pronounced out loud, they are not like that! Instead, a continuous, mushy stream of noise makes its way into our ears, and it is up to our brains to work out what speech sounds are actually in there, where one word ends and the next one begins (think the-sky versus this-guy), and so on. Obviously this process is not exactly helped when there are rock guitars competing for your attention too.

Obama’s elf….. don’t wanna be… Obama’s elf… any more…

But another reason is that we are never ‘just listening’ passively. Instead, behind the scenes, our minds are busy trying to relate what we’re hearing to our existing knowledge – not only our linguistic knowledge, but our general knowledge about the world. For example, the common-sense knowledge that people tend to kiss other people, rather than intangible abstractions like the sky. This is obviously very useful most of the time, but in the “Purple Haze” case it leads us astray, because the more implausible meaning is the one that Jimi Hendrix intended.

What has this all got to do with language change? Well, the crucial point is that what I’ve just said – interpreting sounds is complicated, and to navigate the process we engage our common sense as well as our knowledge of the language – applies just as well to normal conversation as it does to song lyrics. We don’t always hear things perfectly, and even if we do, we have to square the things we’ve just heard with the things we already knew, which provide a guide for our interpretation but may sometimes take us in the wrong direction.

So if you hear someone referring to a really disappointing experience as a damp squib, but are not familiar with squib (an old-fashioned word for a firework), what is to stop you thinking that what you really heard was damp squid? A squid is, after all, a very damp creature, and not always something that people are hugely fond of. Similarly, the expression to nip in the bud makes sense if you latch on to the gardening metaphor it is based on – but if you don’t, well, nipping an undesirable thing in the butt does sound like a very effective way of getting rid of it. So, people who think the expressions really are damp squid and nip in the butt have made a mistake along the lines of “kiss this guy”; the difference is that here they may end up using the new versions in their own speech, and thus pass them on to other speakers. And the process doesn’t have to involve whole expressions: individual words are susceptible to it too, for example midriff becoming mid-rift or utmost becoming up-most.

It’s beautiful, but undeniably damp

Misinterpreted words and expressions like these, which have some kind of new internal logic of their own, are known as eggcorns. This is because egg-corn is exactly how some English speakers have reinterpreted the word acorn, on the basis that acorns are indeed egg-shaped seeds. And the development of a new eggcorn may not involve any mishearing at all, just reinterpretation of one word as another one that sounds exactly the same. Are you expected to toe the line or to tow the line? Are people given free rein or free reign? In each case the two expressions sound identical, and each brings with it some kind of coherent mental image. For the moment, toe the line and free rein are still considered to be the ‘correct’ versions of these idioms, but perhaps in the future that will no longer be the case.

As words and expressions are reinterpreted over time, the language changes little by little: in speech and in writing, people pass on their reinterpretations to one another, in a way which may eventually pass right through the language. The underlying factors producing eggcorns are the same as those producing mondegreens. But unlike the lyrics of “Purple Haze”, words and idioms don’t generally have a fixed author and don’t belong to anybody, meaning that if everyone started calling acorns eggcorns, then that just would be the correct word for them: the previous, now meaningless term acorn would be no more than a historical curiosity, and English as a whole would be very slightly different from how it is now.

So this is how we get from Jimi Hendrix to language change – via mondegreens and eggcorns. Have you spotted any eggcorns in the wild? And how likely do you think they are to catch on and become the new normal?

A narrow hope has fallen man, till Volapük shall reign

WHEN the tower of Babel looked up toward the sky.
Before the huge walls were complete,
They knew but one language, to which we apply,
The musical name “Volapuk.”

But a slight little trouble occurring one day,
They had to stop work, so to speak,
And drop all their tools and hurry away,
Because they forgot “Volapuk.”

And from that day to this men have been on the search.
For that long lost Volapuk
(Louis Eisenbeis, author of Come, swell the ranks of temperance)

Volapük may well have had the shortest lifespan of any known language, at least one that has had dictionaries and grammars devoted to it. It was the first serious attempt at an artificial ‘universal’ language. Devised in the 1880s by the German priest Johann Schleyer, it rapidly soared in popularity, attracting passionate followers the world over, but by the end of the century it was already being pronounced a dead language. Many factors probably led to its demise, not the least of which is that an artificial language is not a very good idea in the first place. And as artificial languages go, Volapük was as complicated as it was peculiar, nor could anyone ever even seem to agree on how it should be pronounced.

But although Volapük never really got off the ground in the real world, it did enjoy a shadowy life in fiction and as an object of idle speculation. So I offer here a virtual history of Volapük in a world that might have been, where we can sing with the poetA narrow hope has fallen man, till Volapük shall reign.

The language enjoys a robust future in Alvarado Fuller’s 1890 novel A.D. 2000. The main character is put in suspended animation by means of an ‘ozone machine’, and wakes up in (wait for it…) the year 2000, where he puts his knowledge of Volapük to good use, since it has become the common language of ‘civilized nations’.

The ozone machine

A practical step in that direction was proposed in Oskar Kausch’s monumental Die Sprachwissenschaft in der Briefmarkenkunde ‘Linguistics in Philately’ (1894), an exhaustive study of the linguistic aspects of stamp collecting. Kausch moots the use of Volapük in international address labels. Didn’t happen.

Volapük stamps from China

Looking at things from the other perspective, the futuristic satire El clavo ‘The Nail’ (1967) by the artist and author Eugenio Granell imagines Volapük as a language spoken in some tribal past, which may be an alternative reality to our present (or past or future for that matter?).

In Maurice Renard’s gruesome and sardonic L’homme truqué ‘The Counterfeit Man’ (1921), Volapük has been taken up as the language of mad scientists. A French soldier in WWI is blinded in battle, captured by the Germans but then shipped off to a castle somewhere in Eastern Europe where a mysterious group of Volapük-speaking scientists are performing ghastly experiments on human subjects. (Highly recommended.)

Extraterrestrials got into the act as well. In James Cowan’s Daybreak (1896), Moon dwellers fire off bombs to Earth filled, among other things, with Volapük texts, thereby successfully introducing the language. This conflicts somewhat with a report from an Illinois newspaper the following year, in which a Close Encounter of the Third kind was reported with a Volapük-speaking member of a Martian expeditionary force.

In the end, as always, it is Satan’s triumph. Or so reports a certain pseudonymous Doctor Bataille in Le Diable au XIX Siècle (1895). Sadly I have not been able to source the original, but as paraphrased in the following year by Arthur Edward Waite in Devil-Worship in France, he reports having discovered that the English had excavated caverns in Gibraltar to house workshops for the manufacture of Satanic idols. These are staffed by English convicts who

commonly communicate with each other in the language of Volapuk. The reason given is that this language has been adopted by the Spoeleic Rite, which I confess that I had not heard of previously, but I venture to think that the doctor has concealed the true reason, and that Volapuk has been thus chosen because it is a diabolical invention ; a universal language prevailed previously to the confusion of Babel, and the new language is an irreligious attempt to produce ordo ab chao by a return to unity of speech.