Eggcorns and mondegreens: a feast of misunderstandings

Eggcorns and mondegreens: a feast of misunderstandings

Have you ever felt that you needed to nip something in the butt, or had the misfortune to witness a damp squid? And what can Jimi Hendrix, Bon Jovi and Freddie Mercury tell us about language change?

Well, if you know Hendrix’s classic “Purple Haze”, you surely remember the moment where he interrupts his train of thought with the unexpected request, ‘Scuse me while I kiss this guy. Or perhaps you recall “Living on a Prayer”, where we hear that apparently It doesn’t make a difference if we’re naked or not. And who can forget the revelation, in “Bohemian Rhapsody”, that Beelzebub has a devil for a sideboard?

Wise words from Celine Dion

If you do remember these lyrics fondly, you are not alone – lots of people are familiar with these exact lines. There is just one problem, of course: none of those songs really say those things. Instead, the lyrics involved are ‘Scuse me while I kiss the sky; It doesn’t make a difference if we make it or not; and Beelzebub has a devil put aside for me. And yet thousands of English speakers the world over have had the experience of listening to “Purple Haze” and the others – and of misunderstanding the words, entirely independently, in exactly the same way.

Mishearings of this kind are common enough that they have been given a name of their own, mondegreens – a word invented by the American writer Sylvia Wright, who as a child heard a poem containing the following lines:

For they hae slain the Earl o’ Moray
And laid him on the green

and assumed that it listed not one but two victims – the unfortunate Earl himself, and “Lady Mondegreen”, a plausible character who happens not to feature in the real poem.

Why does this kind of thing happen? One reason has to do with the nature of spoken language. On the page, English sentences come pre-packaged into words, each of which is made up of distinct, easily-identified letters which look pretty much the same every time. But pronounced out loud, they are not like that! Instead, a continuous, mushy stream of noise makes its way into our ears, and it is up to our brains to work out what speech sounds are actually in there, where one word ends and the next one begins (think the-sky versus this-guy), and so on. Obviously this process is not exactly helped when there are rock guitars competing for your attention too.

Obama’s elf….. don’t wanna be… Obama’s elf… any more…

But another reason is that we are never ‘just listening’ passively. Instead, behind the scenes, our minds are busy trying to relate what we’re hearing to our existing knowledge – not only our linguistic knowledge, but our general knowledge about the world. For example, the common-sense knowledge that people tend to kiss other people, rather than intangible abstractions like the sky. This is obviously very useful most of the time, but in the “Purple Haze” case it leads us astray, because the more implausible meaning is the one that Jimi Hendrix intended.

What has this all got to do with language change? Well, the crucial point is that what I’ve just said – interpreting sounds is complicated, and to navigate the process we engage our common sense as well as our knowledge of the language – applies just as well to normal conversation as it does to song lyrics. We don’t always hear things perfectly, and even if we do, we have to square the things we’ve just heard with the things we already knew, which provide a guide for our interpretation but may sometimes take us in the wrong direction.

So if you hear someone referring to a really disappointing experience as a damp squib, but are not familiar with squib (an old-fashioned word for a firework), what is to stop you thinking that what you really heard was damp squid? A squid is, after all, a very damp creature, and not always something that people are hugely fond of. Similarly, the expression to nip in the bud makes sense if you latch on to the gardening metaphor it is based on – but if you don’t, well, nipping an undesirable thing in the butt does sound like a very effective way of getting rid of it. So, people who think the expressions really are damp squid and nip in the butt have made a mistake along the lines of “kiss this guy”; the difference is that here they may end up using the new versions in their own speech, and thus pass them on to other speakers. And the process doesn’t have to involve whole expressions: individual words are susceptible to it too, for example midriff becoming mid-rift or utmost becoming up-most.

It’s beautiful, but undeniably damp

Misinterpreted words and expressions like these, which have some kind of new internal logic of their own, are known as eggcorns. This is because egg-corn is exactly how some English speakers have reinterpreted the word acorn, on the basis that acorns are indeed egg-shaped seeds. And the development of a new eggcorn may not involve any mishearing at all, just reinterpretation of one word as another one that sounds exactly the same. Are you expected to toe the line or to tow the line? Are people given free rein or free reign? In each case the two expressions sound identical, and each brings with it some kind of coherent mental image. For the moment, toe the line and free rein are still considered to be the ‘correct’ versions of these idioms, but perhaps in the future that will no longer be the case.

As words and expressions are reinterpreted over time, the language changes little by little: in speech and in writing, people pass on their reinterpretations to one another, in a way which may eventually pass right through the language. The underlying factors producing eggcorns are the same as those producing mondegreens. But unlike the lyrics of “Purple Haze”, words and idioms don’t generally have a fixed author and don’t belong to anybody, meaning that if everyone started calling acorns eggcorns, then that just would be the correct word for them: the previous, now meaningless term acorn would be no more than a historical curiosity, and English as a whole would be very slightly different from how it is now.

So this is how we get from Jimi Hendrix to language change – via mondegreens and eggcorns. Have you spotted any eggcorns in the wild? And how likely do you think they are to catch on and become the new normal?

A narrow hope has fallen man, till Volapük shall reign

A narrow hope has fallen man, till Volapük shall reign

WHEN the tower of Babel looked up toward the sky.
Before the huge walls were complete,
They knew but one language, to which we apply,
The musical name “Volapuk.”

But a slight little trouble occurring one day,
They had to stop work, so to speak,
And drop all their tools and hurry away,
Because they forgot “Volapuk.”

And from that day to this men have been on the search.
For that long lost Volapuk
(Louis Eisenbeis, author of Come, swell the ranks of temperance)

Volapük may well have had the shortest lifespan of any known language, at least one that has had dictionaries and grammars devoted to it. It was the first serious attempt at an artificial ‘universal’ language. Devised in the 1880s by the German priest Johann Schleyer, it rapidly soared in popularity, attracting passionate followers the world over, but by the end of the century it was already being pronounced a dead language. Many factors probably led to its demise, not the least of which is that an artificial language is not a very good idea in the first place. And as artificial languages go, Volapük was as complicated as it was peculiar, nor could anyone ever even seem to agree on how it should be pronounced.

But although Volapük never really got off the ground in the real world, it did enjoy a shadowy life in fiction and as an object of idle speculation. So I offer here a virtual history of Volapük in a world that might have been, where we can sing with the poetA narrow hope has fallen man, till Volapük shall reign.

The language enjoys a robust future in Alvarado Fuller’s 1890 novel A.D. 2000. The main character is put in suspended animation by means of an ‘ozone machine’, and wakes up in (wait for it…) the year 2000, where he puts his knowledge of Volapük to good use, since it has become the common language of ‘civilized nations’.

The ozone machine

A practical step in that direction was proposed in Oskar Kausch’s monumental Die Sprachwissenschaft in der Briefmarkenkunde ‘Linguistics in Philately’ (1894), an exhaustive study of the linguistic aspects of stamp collecting. Kausch moots the use of Volapük in international address labels. Didn’t happen.

Volapük stamps from China

Looking at things from the other perspective, the futuristic satire El clavo ‘The Nail’ (1967) by the artist and author Eugenio Granell imagines Volapük as a language spoken in some tribal past, which may be an alternative reality to our present (or past or future for that matter?).

In Maurice Renard’s gruesome and sardonic L’homme truqué ‘The Counterfeit Man’ (1921), Volapük has been taken up as the language of mad scientists. A French soldier in WWI is blinded in battle, captured by the Germans but then shipped off to a castle somewhere in Eastern Europe where a mysterious group of Volapük-speaking scientists are performing ghastly experiments on human subjects. (Highly recommended.)

Extraterrestrials got into the act as well. In James Cowan’s Daybreak (1896), Moon dwellers fire off bombs to Earth filled, among other things, with Volapük texts, thereby successfully introducing the language. This conflicts somewhat with a report from an Illinois newspaper the following year, in which a Close Encounter of the Third kind was reported with a Volapük-speaking member of a Martian expeditionary force.

In the end, as always, it is Satan’s triumph. Or so reports a certain pseudonymous Doctor Bataille in Le Diable au XIX Siècle (1895). Sadly I have not been able to source the original, but as paraphrased in the following year by Arthur Edward Waite in Devil-Worship in France, he reports having discovered that the English had excavated caverns in Gibraltar to house workshops for the manufacture of Satanic idols. These are staffed by English convicts who

commonly communicate with each other in the language of Volapuk. The reason given is that this language has been adopted by the Spoeleic Rite, which I confess that I had not heard of previously, but I venture to think that the doctor has concealed the true reason, and that Volapuk has been thus chosen because it is a diabolical invention ; a universal language prevailed previously to the confusion of Babel, and the new language is an irreligious attempt to produce ordo ab chao by a return to unity of speech.

The stretchiness of Oceanic possessive classifiers

The stretchiness of Oceanic possessive classifiers

In many of the Oceanic languages, if you want to talk about someone’s tomatoes you have to use a special word that tells you how the owner of the tomatoes intends to use them. These special words – possessive classifiers – centre around culturally important interactions. You can’t simply say ‘his tomatoes’ but have to say something like ‘his tomato [which he will eat]’ or ‘his tomato [which is on his land]’.

Here’s a couple examples from Vatlongos, spoken in Vanuatu:

Tomato an ‘his tomato [to eat]’
Tomato san ‘his tomato [growing on his land]’
Tomato nan ‘his tomato [for other purposes]’

What’s really interesting is that as there are so many languages within Oceania– around 500 – there is lots of variation in how speakers use the classifiers with various possessions. Some languages are really stretchy, in that a possession can occur with many different classifiers as long as the speaker can think of a plausible situation or context – just like in Vatlongos above. Other languages are less stretchy and somewhat sticky, in that a possession can only ever occur with one classifier. So, in North Ambrym (Vanuatu), as we will see more of below, the word for tomato would only ever be edible regardless of differing contexts.

As part of our project on Optimal Categorisation we tested speakers from six different Oceanic languages on how sticky and stretchy their possessive classifiers are. We looked at Merei, Lewo, Vatlongos, North Ambrym (Vanuatu), Nêlêmwa and Iaai (New Caledonia). Each of these languages has a different possessive classifier inventory, from 2 (Merei) to 23 (Iaai). In order to test their stretchiness, we created video clips of people interacting with different objects and asked the speaker to describe what was happening with reference to the person’s possessions.


Intended context: ‘he is drinking his water’


Intended context: ‘he is washing with his water’

Some of the contexts we wanted to test were a bit strange as we wanted to see if speakers would use the classifiers in the same manner for both typical interactions (like the videos of drinking and washing with water above) and atypical interactions (like eating coffee or drinking raw eggs!). This way speakers would be confronted with strange situations that wouldn’t normally occur in their culture (or anyone’s culture for that matter – seriously who drinks raw eggs anyway?).

Some people do like to drink raw eggs!

A well-behaved possessive classifier system should allow the same word used for a possession to occur with a different classifier depending on the various contexts.

For brevity’s sake let’s just look at two languages from our sample – Lewo and North Ambrym – spoken on the neighbouring islands of Epi and Ambrym in Vanuatu. With very typical interactions like drinking water and washing with water the speakers of Lewo changed the classifier to match the interactional context. For drinking water, all 20 speakers tested used the drinkable classifier along with the word for water. Similarly, for washing with the water, the vast majority of speakers, 16 out of 20, used the general classifier along with the word for water. This is pretty much what we expect from a well-behaved classifier system where a drinking context evokes the drinkable classifier, and a more general context evokes a general classifier.

What about more atypical interactions? – let’s compare the videos of someone eating eggs (typical) and the video of the man drinking eggs (very atypical!). For the speakers of Lewo all 20 used the edible classifier when talking about someone eating eggs. For the drinking eggs context only 9 speakers used the drinkable classifier, with the rest either using an edible classifier or the general classifier.

The classifier system of Lewo works well for typical interactions (stretchy!), but not so well for atypical interaction (a little bit stretchy and a little bit sticky!).

Now let’s compare Vatlongos to North Ambrym. For both the typical interactions of drinking and washing with water our 23 North Ambrym speakers gave the drinkable classifier. What? This is not what we expected! We expected that there would be a shift to the general classifier for the video of the man washing with water. North Ambrym is not behaving like an exemplary classifier system – it is much stickier than Lewo’s system.

What about atypical interactions? For the video of the person eating eggs, all speakers used the edible classifier, as expected. For the atypical drinking of eggs, 21 out of 23 speakers gave the edible classifier too! So a very sticky result with speakers using the same classifier regardless of the contextual interaction.

So what do our results show? That the classifier system in Lewo functions more like a well-behaved classifier system than North Ambrym’s does. Lewo’s classifier system behaves well in typical everyday situations, but not so well in atypical situations where speakers must make judgements on the fly. North Ambrym however, doesn’t look at all like a well-behaved classifier system. On the stretchiness-stickiness scale, North Ambrym is much more on the sticky side than Lewo is.

Lost in Translationː the Morph team’s top 10 untranslatable words

Lost in Translationː the Morph team’s top 10 untranslatable words

To celebrate the end of UNESCOs International Year of Indigenous Languages we thought we would take a look at some of the Indigenous languages that we are researching and present some of our favourite words. Now these words just aren’t any old words, they are words that can’t be directly translated into English using a single word and must be translated using a rather long-winded explanation. Each of these words offer unique cultural insights into the speakers of these languages. We will be skipping across the continents to all the exciting places where we conduct our research…

South Sudan and Ethiopia
Our first stop on our world tour of untranslatable words is to South Sudan and Ethiopia where two closely related West Nilotic languages are spoken – Nuer and Reel. The Nuer tribe is one of the largest ethnic groups in South Sudan with around a million or so speakers. Whereas Reel is spoken by around 50,000 speakers from the Atwot tribe.

The Nuer and Atwot peoples are traditionally pastoralists. Cattle play an important role in every aspect of the traditional life. The Nuer and Atwot also rely to some extent on horticulture for their living. They lead a semi-nomadic life style determined by the availability of pasture grounds.

Speakers of Reel in Juba, South Sudan

1. tɛ́ɛt ‘to claim something back that was previously given out for good’

The Nuer verb tɛ́ɛt roughly translates as ‘to claim something back that was previously given out for good’. It is used in the situations when an item has been given to someone for good but then later the item is being recalled back. For example, it is customary to give cattle to the parents of a bride. If, for some reason, the couple wants to separate, the cattle have to be returned before the woman can go back to her parents.

2. wé̤eer ‘search by parting something’

The next word comes from Nuer’s neighbours – The Reel speaking Atwot tribe. The verb wé̤eer translates as ‘search by parting something’. This word is used when the searching involves moving apart items that sit together densely as, for example, maize or bushes.

è-wé̤eer				dṳ̂t
DECL-search.by.parting.3SG	old.grass.PL
‘S/he is searching by parting old grass.’

Kazakhstan
Moving on to Central Asia and to the largest landlocked country in the world. Kazakh is the national language of Kazakhstan, though also spoken in Xinjiang province of China and in parts of Mongolia.

3. Tusau Keser ‘the cutting of the tether’

One of the first Kazakh rituals that a child goes through is Tusau Keser (Тұсау кесер) – which means ’the cutting of the tether’. When a young Kazakh starts to walk, their parents organize a party and the child’s legs are tied together with colourful threads. This colourful tether is then cut to welcome the child to the next stage of their life.

It is believed that if the Tusau Keser ceremony is not performed, the child will be unlucky or have problems walking in their adulthood. In some parts of Kazakhstan they tie the legs with the fatty intestines of a horse, which – in case you were wondering – represents wealth!

The Tusau Keser ceremony

The beginnings of this ceremony lie in the pastoral culture of the Kazakhs. The legs of young horses and sheep are tethered in order to tame them and only cut when they are old enough not to wonder away from the rest of the animals. Therefore, the day an animal’s tusau ‘tether’ is cut is meant to be the beginning of a new life stage.

4. Süyinshi ’be happy’
Süyinshi (сүйінші) literally means ’be happy’, but this word is used only in one specific situation. If something really great has happened to someone and they want to share the good news with their friends, they have to shout süyinshi before telling everyone the news. What’s great about this word is that when someone shouts süyinshi, the friends get to ask for any kind of present they want from the person shouting süyinshi. Normally this mini ritual starts with friends asking for houses, cars or livestock, and then ends up in the pub where the vodka is bought for the friends instead.

Dagestan
On the other side of the Caspian Sea in the Caucuses lies Dagestan where one of the SMG’s favourite languages lies – Archi. With only around 1300 speakers, Archi is considered an endangered language.

5. biční ‘lower corner of a sack or bag’

Not only are the Archi people famous for their lamb due to proximity of lush alpine pastures, but they also make rather beautiful bags called tus:əra. The lower corners of these handmade bags have a special term – biční. The corners of larger sacks, used for carrying grain, were the best place to hold on to upend and pour out the contents. The corners of the smaller bags are also embellished with rather beautiful tassles. What’s even more interesting about these corners is that one corner is called biční, but two corners are called boʒdo. Archi uses a different word form (known as a suppletive form) for the plural. This goes against the claim that suppletives are only used for frequently occurring words, with the lower corner of a bag probably not cropping up in many everyday conversations.

one biční, two boʒdo

These beautiful bags were originally used in everyday life, but nowadays they are reserved for traditional ceremonies. At wakes these bags are filled with traditional foods such as sweetmeats.

Vanuatu
Skipping across to the South Pacific and to most linguistically dense place in the world – the archipelago of Vanuatu. The Oceanic language of North Ambrym, with around 5000 speakers, not only has interesting possessive classifiers but also a whole host of culturally specific and directly untranslatable words. The Ni-Vanuatu (people from Vanuatu) are self-sufficient farmers with plenty of land to grow yams, manioc, bananas and raise pigs.

6. fafar ‘to wipe your bottom on a tree trunk’

By far this is my favourite word from North Ambrym. if there are no suitable leaves around after doing your business in the bush it makes sense to use a tree trunk. Of course, not every tree trunk can be used for this sort of thing. Please avoid large and knobbly trunks – slender smooth trunks are advisable!

7. yangyangne ‘to shoot an arrow to follow its course in order to find a lost arrow’

Not paying attention when you were off shooting wild birds in the jungle with your arrows? Well shoot another one with the same power and in the same direction and make sure you pay attention this time and you may find your lost arrow. Bad golfers could probably use this trick to find their lost balls in the rough!

A bow and arrow from northern Ambrym

Siberia
Now off to eastern Siberia and to the Tungisic language of Negidal which sadly only has a handful of speakers left.

8. un’i ‘be upset, get ill because someone ate in your presence and did not offer to share the food’

via GIPHY

You should stay away from scrooges this Christmas as it would be a shame if someone ate a mouth-watering turkey roast with all the trimmings in front of you and did not offer you anyǃ Negidal speakers can also use this verb in other situations, not just for when people eat food in front of you. Un’I can be used for any unfulfilled desire which makes you ill, such as wanting to smoke a cigarette when there are none left or from wanting to see a close friend who is far away. The depression that you feel can be so great sometimes that it is said that you can die from it.

Lapland
Seeing as Christmas is almost upon us what better place to end our untranslatable journey than in Lapland and the language of Skolt Saami. Skolt Saami is spoken in the far northeast of Finland with only around 300 speakers. Traditionally the Skolt Saami are reindeer herders, which is still important to this day. The Skolt Saami have many specific terms for their reindeer.

9. saʹmjaʹd ‘black reindeer’

The word sa’mja’d isn’t made up of the words for black and reindeer in the language and is a specific word that describes black reindeer. If you want to talk about reindeer in general then you would use puäʒʒ, and the word for black is čaʹppes.

10. čiõrmiǩ ‘one year old reindeer’

Only the strong survive in Lapland and there is even a special term for those strong young reindeer who make it through their first year.

With many words for the different types of reindeer we were hoping to find one that meant ‘reindeer with a red nose’, but sadly couldn’t find oneǃ

Merry Christmas from all of us at MORPHǃ

With thanks to Marina Chumakina, Tatiana Reed, Dávid Györfi, Tim Feist and Greville Corbett for their contributions.

Cushty Kazakh

Cushty Kazakh

With thousands of miles between the East End of London and the land of Kazakhs, cushty was the last word one expected to hear one warm spring afternoon in the streets of Astana (the capital of Kazakhstan, since renamed Nur-Sultan). The word cushty (meaning ‘great, very good, pleasing’) is usually associated with the Cockney dialect of the English language which originated in the East End of London.

Del Boy from Only Fools and Horses
Del Boy from Only Fools and Horses

Check out Del Boy’s Cockney sayings (Cushty from 4:04 to 4:41).

Cockney is still spoken in London now, and the word is often used to refer to anyone from London, although a true Cockney would disagree with that, and would proudly declare her East End origins. More specifically, a true ‘Bow-bell’ Cockney comes from the area within hearing distance of the church bells of St. Mary-le-Bow, Cheapside, London.

Due to its strong association with modern-day London, the word ‘Cockney’ might be perceived as being one with a fairly short history. This could not be further from the truth as its etymology goes back to a late Middle English 14th century word cokenay, which literally means a “cock’s egg” – a useless, small, and defective egg laid by a rooster (which does not actually produce eggs). This pejorative term was later used to denote a spoiled or pampered child, a milksop, and eventually came to mean a town resident who was seen as affected or puny.

The pronunciation of the Cockney dialect is thought to have been influenced by Essex and other dialects from the east of England, while the vocabulary contains many borrowings from Yiddish and Romany (cushty being one of those borrowings – we’ll get back to that in a bit!). One of the most prominent features of Cockney pronunciation is the glottalisation of the sound [t], which means that [t] is pronounced as a glottal stop: [ʔ]. Another interesting feature of Cockney pronunciation is called th-fronting, which means that the sounds usually induced by the letter combination th ([θ] as in ‘thanks’ and [ð] as in ‘there’ are replaced by the sounds [f] and [v]. These (and some other) phonological features characteristic of the Cockney dialect have now spread far and wide across London and other areas, partly thanks to the popularity of television shows like “Only Fools and Horses” and “EastEnders”.

As far as grammar is concerned, the Cockney dialect is distinguished by the use of me instead of my to indicate possession; heavy use of ain’t in place of am not, is not, are not, has not, have not; and the use of double negation which is ungrammatical in Standard British English: I ain’t saying nuffink to mean I am not saying anything.

Having borrowed words, Cockney also gave back generously, with derivatives from Cockney rhyming slang becoming a staple of the English vernacular. The rhyming slang tradition is believed to have started in the early to mid-19th century as a way for criminals and wheeler-dealers to code their speech beyond the understanding of police or ordinary folk. The code is constructed by way of rhyming a phrase with a common word, but only using the first word of that phrase to refer to the word. For example, the phrase apples and pears rhymes with the word stairs, so the first word of the phrase – apples – is then used to signify stairs: I’m going up the apples. Another popular and well-known example is dog and bone – telephone, so if a Cockney speaker asks to borrow your dog, do not rush to hand over your poodle!


Test your knowledge of Cockney rhyming slang!

Right, so did I encounter a Cockney walking down the field of wheat (street!) in Astana saying how cushty it was? Perhaps it was a Kazakh student who had recently returned from his studies in London and couldn’t quite switch back to Kazakh? No and no. It was a native speaker of Kazakh reacting in Kazakh to her interlocutor’s remark on the new book she’d purchased by saying күшті [kyʃ.tɨˈ] which sounds incredibly close to cushty [kʊˈʃ.ti]. The meanings of the words and contexts in which they can be used are remarkably similar too. The Kazakh күшті literally means ‘strong’, however, colloquially it is used to mean ‘wonderful, great, excellent’ – it really would not be out of place in any of Del Boy’s remarks in the YouTube video above! Surely, the two kushtis have to be related, right? Well…

Recall, that cushty is a borrowing from Romany (Indo-European) kushto/kushti, which, in turn, is known to have borrowed from Persian and Arabic. In the case of the Romany kushto/kushti, the borrowing could have been from the Persian khoši meaning ‘happiness’ or ‘pleasure’. It would have been very neat if this could be linked to the Kazakh күшті, however, there seems to be no connection there… Kazakh is a Turkic language and the etymology of күшті can be traced back to the Old Turkic root küč meaning ‘power’, which does not seem to have been borrowed from or connected with Persian. Certainly, had we been able to go back far enough, we might have found a common Indo-European-Turkic root in some Proto-Proto-Proto-Language. As things stand now, all we can do is admire what appears to be a wonderful coincidence, and enjoy the journeys on which a two-syllable word you’d overheard in the street might take you.

A picture is worth a thousand words: Choosing images for psycholinguistic research

A picture is worth a thousand words: Choosing images for psycholinguistic research

Linguists need to come up with different ways of testing our theories of how particular languages in the world function. We generally rely on two main methods of data collection – linguistic elicitation and corpus collection. With linguistic elicitation a linguist asks a speaker of a language: ‘How do you say “Monty Python is really funny” in your language?’ But can we be sure that what the speaker said is naturalistic and not just a word for word translation?

Linguists need naturalistic data and can also record stories and conversations to build up a representative sample of a language (a corpus). This however takes a lot of time, effort and dedication on the part of both the linguist and the community of speakers of a language. It might even be that – after years of toil – the particular construction that a linguist wants to look at is under-represented with a dearth of examples in the corpus.

Thankfully, there is a happy medium! We can combine cognitive psychological techniques and targeted linguistic elicitation, to create scenarios where speakers produce naturalistic responses. Of course, this technique brings with it another set of problems entirely.

Psycholinguistic experiments need to be carefully designed and can’t be made up on the fly in response to something a speaker of a language says to you; this is drastically different to standard linguistic elicitation where one can continually come up with new sentences to check, while in the middle of working with a speaker of a language.

In our current research on optimal categorisation we aim to find out how different nouns are assigned to different classifiers in a group of six related Oceanic languages spoken in Vanuatu and New Caledonia. Each language has a different inventory size of classifying particles — from two to 23 — which are used in possessive constructions, and categorise the possession in terms of its use or functionality.

Here are a few examples from the Iaai language, spoken in New Caledonia, which has the largest inventory of classifiers in our sample of languages:

(1a)	a-n			wââ	(b)	hanii-ny		wââ
        FOOD.CLASSIFIER-his	fish 		CATCH.CLASSIFIER-his	fish
        ‘his fish (to eat)		        ‘his fish (which he caught)’
(2a)	a-n			koko	(b)	noo-n			koko
	FOOD.CLASSIFIER-his	yam		PLANT.CLASSIFIER-his	yam
	‘his yam (to eat)’			‘his yam plant’

We want to see whether or not a particular noun that refers to a particular entity can occur with different classifiers, like with the words for ‘fish’ and ‘yam’ in Iaai above. Also, how does a language with 23 classifiers function differently from a language with just two or three classifiers?

One way in which we can discover how the classifiers function in each language is to use a card sorting experiment. These experiments present speakers with entities in the form of pictures. Speakers are asked to sort them into different groups, first in a “free sort” where they can create groups on any basis they feel is relevant and important, and second, in a “structured sort” where they are asked to group entities according to which classifier they would use in a possessive construction. By doing this with lots of participants we can see individual speaker variation in language usage in one language and across languages and get a clear sense of if and how a language’s classifier system is influencing the way that speakers think about and process different entities.

Once we have decided on which nouns to test in a card sort experiment we have to find or make pictures that represent these images. Sadly I don’t have the artistic skills of Michelangelo and won’t be painting any masterpieces for the experiment! 

Choosing what type of image is trickier than it sounds as we are presented with an array of options.

First should we use simple line drawings of the images? The Noun Project has over 2 million small black and white line drawings. With such a choice of images we can find what we need. Here are some images of yams that I found on the site that we could use for our experiment.

These are great, and I know they are yams because I searched for images of yams on the website. But if I present these images to speakers I want them to tell me what they are. If the images aren’t instantly recognisable then participants will use different nouns to describe what they are seeing – is it a yam? A sweet potato? Manioc? Or some other entity? Actually, to tell you the truth, the third picture is actually a sweet potato! But it looks very similar to the first picture of a yam. Another problem is that these images can be quite abstract – and we can’t be sure that these symbolic representations of entities will be shared across different cultural and linguistic groups.

What about black and white pictures? – These are cheaper to print and easier to standardise. But we do not see the world in black and white and presenting entities as black and white pictures  may make it harder to identify  them, especially when the lightness of the background and the object of focus are similar. We need to be sure that the images we choose are easy to identify or else we can end up with problems of misidentification.

Another possibility is to remove the background of the image.  By doing this we can eliminate distractions and help the participant focus on the object in the image. However, the background is often key. Background information gives context that can influence how the speaker of a language perceives the entity in the image.

For instance, speakers may classify a fish that has been caught differently to a fish that is alive and swimming in the sea. The edible classifier is more likely with the former scenario, and a general classifier with the latter. But if we were to remove the background from both of these photos they would look strikingly similar! This leads us onto a very important question – what classifier would speakers of these languages use for a parrot if it was alive or dead?

So now we have decided to present images in colour and keep the background. But we must make sure that the background varies across different images. We don’t want participants to sort the entities into groups based on a colour or shape in the background or some other extraneous visual cue that may appear in several pictures!

For every psycholinguistic experiment that uses images there are multiple decisions that need to be made to figure out what type of image is required. The images we have chosen are specifically tailored to the nature of the languages we are studying to ensure that they are culturally relevant and thus identifiable.

For us, the pictures need to be realistic and represent the world around us — Sadly, we can’t take artistic licence with kangaroos and trampoline acts, as fun as that would be!

 

Poolish

Poolish

Courtesy of thefreshloaf.com

Those who have out of desire have chosen to or out of dire necessity been forced to bake their own bread may have encountered the term poolish. It refers to a semi-liquid pre-ferment used in bread-making, a mixture of half water and half white flour mixed with a teeny bit of yeast and allowed to slowly ferment for several hours, up to a day, before mixing up the final dough.

The word itself is an exceedingly odd one, and has been the source of much head-scratching and inconclusive speculation among bread-bakers across the world: it looks like the English word Polish, but is spelled funny, and anyway seems to be borrowed from French, where the spelling would be funnier still. Most discussions of the technique include the obligatory etymological digression, usually fantastical, involving journeymen Polish bakers fanning out over Europe. Linguists too have gotten on the trail: David Gold’s Studies in Etymology and Etiology (2009) devotes a whole page to the question, but does not get too far.

In its current form it is technical jargon from French commercial baking, and has probably made its way to a broader public through Raymond Calvel’s influential Le gout du pain (‘The taste of bread’) from 1990. In his account:

This method of breadmaking was first developed in Poland during the 1840s, from whence its name. It was then used in Vienna by Viennese bakers, and it was during this same period that it became known in France. (2001 edition translated by Ronald Wirtz)

This explanation has been widely accepted, and appears in one form or another in any number of bread-baking books. But how could it even be true? The first problem is the word itself. Poolish is not the French word for Polish, and doesn’t much look a French word anyway. In earlier French texts it crops as pouliche, which looks more French and is indeed the word for a young mare, whose connection to bread dough is tenuous at best. But earlier French texts also have the spelling poolisch or polisch, which looks rather more German than French and suggests we follow the Viennese trail instead.

This thread of inquiry has its own potential hiccoughs. The German word for Polish is polnisch, with an [n], so would this not just be fudging things? Actually not: polisch, poolischpohlisch or pollisch turn up often enough in older texts as alternative words for ‘Polish’, particularly in southern varieties of German that include Austria. And it is exactly in these form that we find it being used to refer to this particular process, juxtaposed with Dampfl (or Dampfel or Dampel), the term in southern Germany and Austria for a rather stiffer pre-ferment which goes through a shorter rising period, as in these two examples from 1865, one from Leopold Wimmer’s self-published advertising advertising screed for St. Marxer brand (of Vienna) pressed yeast, where it turns up as Pohlisch:

the other from Ignaz Reich’s (of Pest, as in Budapest) account of ancient Hebrew baking practices, where it’s rendered as pollisch.

The term polisch (in all its variants) in this sense seems to have died a natural death in German, only to reemerge during the current craft-baking revival in the guise of poolish.

But if poolish was originally the (or a) German word for Polish, we run up against the sticky question of what it was actually referring to. Calvel repeats the story that this technique was invented by Polish bakers (which turns up in a 1972 article in The Atlantic Monthly, I think anyway, because it’s but coyly revealed by Google in snippet view), a supposition which lacks as much plausibility as it does historical attestation. Poland has traditionally been a land of sourdough rye bread. Is seems unlikely that a novel technique involving the use both of white wheat flour and commercial pressed yeast (a relatively new product) would have been devised there and introduced into the imperial capital that was Vienna. So what on earth could it have meant?

Here I make my own foray into speculation; you read it here first. Poland is not just a land of sourdough rye bread, it is a land of a soup made from rye sourdough: żur or żurek (itself derived from sur, one variant of the German word for ‘sour’), still widely consumed and also sold in ready form form for time-strapped gourmands. Since the Austro-Hungarian Empire included much of what had once been Poland, it isn’t too far-fetched to think that people in Vienna might have been familiar with this soup. And since the salient characteristic of poolish is that it is basically liquid, in opposition to more solid doughs, my guess is that the term poolish arose as a facetious allusion to żur: a soup-like fermenting dough mixture, like the thinned-out sourdough soup that Poles eat.

This theory has the minor drawback of lacking any positive evidence in its favor. So far the only 19th century reference to żur outside of its normal context that I have been able to find is as a cure for equine distemper, otherwise known as ‘strangles’. That leads us into the topic of pluralia tantum disease names…

What do we lose when we lose a language?

What do we lose when we lose a language?

By the end of this century we are likely to lose half of the world’s six thousand languages. With each lost language a whole world of thought, customs, traditions, poems, songs, jokes, myths, legends and history gets lost. Knowledge of local plants, herbs, mushrooms and berries, their medicinal and culinary uses disappears, together with names for small rivers, mountains, valleys and forests. And this is only a tiny fragment of what we lose when we lose a language.

For a linguist, a loss of a language is first and foremost a loss of system with a unique set of properties and rules which make it work. If there are any universal principles behind the architecture of human language, our only hope to figure them out is by studying the multitude of languages still existing on the planet. And endangered languages – those that we were lucky enough to have time and resources to study – show us time and again how vast is the range of linguistic variability. For example, it has been thought and stated by linguists and psychologists that grammatical tense can be marked by verbs only, as hundreds and hundreds of languages behave this way. Then we discovered that Kayardild, a morbidly endangered language of Australia, marks tense on nouns as well as verbs, making us reconsider this ‘universal’.

Archi, a language spoken in one village the highlands of Daghestan (Caucasus, Russia), is an endangered language which I have been working on since 2004. There are only about 1300 speakers of this language and, as far as we know, there never have been more than that. Yet for centuries it was spoken in the Archi village (below) and passed to younger generations without being under any threat.

Being so small, there was never a writing system invented for Archi – people in the village did not need to write to each other, and all communication with the outsiders happened in one of the larger languages of the area. Until the 1940s this was Lak, then Avar (two large languages of Daghestan), and in the past 40 years, these have been increasingly replaced by Russian. Archi people lived a hard but self-sufficient life keeping sheep in the mountains for themselves and for trading (the alpine pastures within walking distance of Archi village make their lamb hard to compete with) and growing grains, mostly rye, on terraces: narrow strips of land dug into the steep mountain slopes. These grains were just for their own consumption, as it was too hard a job to grow any more than they needed to survive.

We cannot even say that the arrival of television, mobile phones and the internet – which happened more or less at the same time in Archi – is responsible for language decline. It is just that  life in the mountains is very hard, so the Archi people start moving to the cities, abandoning their traditional way of life and their language. Since I started working with Archi, two of the village’s primary schools have been closed and others are struggling as young people continue to leave. Kids abandon Archi as soon as they go to school or nursery in town, and their parents tend to follow suit. Older people in the village still wear traditional dress and keep up traditional skills, but the younger generation is moving away from these traditions. And when the last school closes in the village and no more children live there, the language’s fate will be sealed.

What will we lose once Archi is lost? We will lose a verbal system which boasts the largest number of verb forms registered – Archi verb has up to 1.5 million forms. With this, we will forever lose the opportunity to figure out how the human brain can operate such a humongous system; we won’t be able to watch children learning such a complex language, going through stages of acquisition, making telling mistakes and the overgeneralisations (like English kids do when they go through the stage of producing forms like goed, readed, telled, eated etc). We will have the knowledge that a system such as the Archi verb existed, but we will never know how it functioned.

We will lose a system of deictic pronouns (like English ‘this’ and ‘that’) which had five words in it. These mark not just the proximity to the speaker (like English this), but also the perspective of the listener, and the vertical position in regard to the speaker (see below). Even if these are not unique as lexical items, the whole linguistic system in which they operate is unique. We don’t know yet how these pronouns work in stories as opposed to conversation, and at the moment we have no good techniques to find this out.

jat this, close to the speaker
jamut ‘this, close to the hearer’
tot ‘that, far away from the speaker’
godot ‘that, far away and lower than the speaker’
ʁodot  (the first sound is a bit like the French pronunciation of r) ‘that, far away and higher than the speaker’

 

We will lose a system where subject and object in the sentence work differently from what we are used to in European languages. In most European languages, the subjects of transitive and intransitive verbs have the same form (as in He arrived and He brought her along), while the object gets a different marking  (She arrived vs. He brought her along). In Archi, the subject of an intransitive verb such as ‘arrive’ is marked the same as the object of a transitive verb such as ‘bring’:

Tuw qa ‘he arrived’

Tormi tuw χir uwli ‘She brought him’.

This is called Ergative-Absolutive alignment, and was first brought to the attention of  linguists by the Australian language Dyirbal, which is now already dead. Several other linguistic families of the world use the same way of making sentences, including Archi. As not many Dyirbal materials have been recorded, it is Archi and other endangered Daghestanian languages that have been making linguists reconsider universals about subject, object and verb relations.

This is only a glimpse of the impact that endangered languages have on linguistics as a discipline. In the last few decades, linguists have become much more aware of how invaluable endangered languages are and how fragile their futures, and more and more efforts are now directed to documenting and – whenever possible – preserving the linguistic diversity of the world.

How to break an impasse

How to break an impasse

Have Brexit negotiations met an impasse (where the first vowel sounds like the vowel in ‘him’), or an impasse where the vowel is like the initial sound in the French word bain /bɛ̃/? Or is it something in between?

If it is the former, congratulations! This borrowing from French has been successfully integrated into your native phonology, whilst simultaneously making a nod to its orthography.

If you opt to French-it-up then you have recognised that this word is not an Anglo-Saxon one, and that it should be flagged as such by keeping the pronunciation classic. Or you are French.

If you are somewhere between these two extremes, you are in good company. This highly topical word has no less than 12 British variants listed in the OED, reflecting various solutions to integrating the nasalized French vowel /ɛ̃/ and stress pattern into English:

Choosing which pronunciation to use for impasse is both a linguistic and social minefield, with every utterance revealing something about your education and social networks. No pressure then.

Recent news reports are providing a very rich corpus of data on the pronunciation of this specific word, with many variants being used within the same news report by different speakers, and perhaps even the same speaker.

For those yet to commit, choosing which to pick may be bewildering. So how do we avoid this impasse? Perhaps unsurprisingly, one tactic speakers use is to avoid using a word they aren’t confident pronouncing altogether. It might be safer to stick to deadlock.

Watch BBC Political Editor Laura Kuenssberg translate deadlock into German, Spanish and French.

Ultimately, our cousins across the pond may have some influence in resolving this issue in the long term. The OED lists only two variants for U.S. English, with variation based on stress, not vowel quality, and U.S. variants of words (e.g. schedule, U.S. /skɛdjuːl/ vs U.K. /ˈʃedʒ.uːl/) are widely adopted in the speech of the UK public. But this will not necessarily be the case and the multiple UK variants may continue for some time.

The impasse goes to show that languages tend to tolerate a whole lot of diversity, even when the world of politics doesn’t.

Drinkable houses, edible canoes and Trojan horses

Drinkable houses, edible canoes and Trojan horses

Michael Lotito, a French entertainer known as Monsieur Mangetout, became famous for his penchant for devouring objects that most would consider inedible. From bicycles and televisions to the most bizarre of all, a Cessna 150 light aircraft.

Though Monsieur Mangetout hailed from France, one might have thought that he was from the archipelago of Vanuatu. This small island nation is not only famous for being the most linguistically dense country in the world – with over 130 languages for a population of just a quarter of a million – but is also renowned for its intriguing possessive classifiers, which turn up in sentences when you talk about the things that you own, much like the possessive pronouns in English – my, your, hers etc. But in the Oceanic languages of Vanuatu these classifiers also tell us about how you will use the item that you own.

It took Michael Lotito two years to eat the Cessna 150!

The most common distinctions these classifiers make are between three types of possessions: ones that can be drunk, eaten and a residual classifier used when the more specific instances of eating and drinking aren’t needed. So, for example if you speak Paamese you can make a distinction between a coconut that you will drink, ani mak ‘my drinkable coconut’; one that you will eat the flesh of, ani ak ‘my edible coconut’; or one that you intend to sell, ani onak ‘my coconut for an unspecified use’.

But, more intriguingly, several languages of Central Vanuatu, spoken on the islands of Pentecost, Ambrym, Paama and Epi, use the food and drink classifiers for some rather strange items that one might not consider to be edible or drinkable — though of course Michael Lotito might beg to differ. The drink classifier in the language of North Ambrym covers a rather broad range of entities, including the obvious drinks such as water, tea, coffee and juice:

(1)	ma-n			we	/	ti	/	jus
	DRINK.CLASSIFIER-his	water		tea		juice
	‘his water/tea/juice’

But the classifier is also used with items that can’t be drunk, like the words bwelaye ‘cup’ or bwela ōl ‘coconut shell (used as a cup)’, but also im ‘house’, hul ‘mat’ and bulubul ‘hole’. And in the Sa language spoken on Pentecost island, the food classifier can also be used with the word bulbul ‘canoe’!

The languages of Central Vanuatu where houses can be drunk, except for Raga which likes to be different.

How do you drink a house? How do you eat a canoe? While Michael Lotito might well be able to eat canoes and drink houses, the people who speak these languages certainly do not! So what explanation can be given as to why and how these non-drinkable and non-edible entities are included within the semantic domain of drinks and food?

The words meaning cups and containers of liquids are included with the drink classifiers in some of these languages due to a process of semantic extension. This is when the coverage of the semantics of a classifier are extended to include entities that are frequently associated with the core meaning of that classifier. This type of semantic extension is known as metonymy, where the word for a container can be used instead of the word for what it contains – e.g. in English we can use the word ‘dish’ to refer not only to a plate, but also to its contents. It is not such a large cognitive step to associate drinks with cups, and that is why containers of liquids are now included in the drink classifier’s semantic domain. However, it is quite a large cognitive leap to think that houses are associated with drinks and canoes with food.

To explain how houses are now classified along with drinks and canoes with food we have to look into the history of the languages and how these languages have changed through time. This is of course quite a difficult endeavour considering that these languages have no literary traditions and are only now just starting to be written down. We cannot  consult old texts to see how the language used to be several hundred years ago as these don’t exist. Though limited records exist for a few languages going back to the mid 1800s, we mainly have to rely on comparing how related languages in the area differ and try to figure out how they got to be different.

Let’s start by looking at the language of Apma, spoken on Pentecost. The word for house, imwa, doesn’t occur with the drink classifier, but instead occurs in a different possessive construction where the owner is marked directly on the word for house, instead of on a classifier:

(2)	imwa=n		atsi
	house=his	person
	‘a person’s house’

This type of construction, called direct possession, normally occurs with possessions closely associated with the possessor, including body parts and kinship terms, but sometimes includes more intimate personal possessions as well. Now if we look at Apma’s neighbouring language, Ske, spoken to the south, the noun for house occurs with the drink classifier:

(3)	im	mwa=n			azó
	house	DRINK.CLASSIFIER=his	person
	‘a person’s house’

As you can see the word for house in Ske, which historically for the languages of Pentecost would have been imwa just like it is in Apma, has been split, where the first part im now means ‘house’, and speakers recognise the second part of imwa, namely mwa, as identical in form to the drink classifier. Speakers have now reanalysed the second part of the word for house as being the drink classifier, and now accept houses as being classified along with drinkable entities. A similar mechanism has occurred across several other languages of Central Vanuatu, and this is why houses are classified along with drinks.

Just what is a drinkable house anyway?

In most languages of Vanuatu this change didn’t occur and houses are either directly possessed or occur with the residual general classifier. But in a few other languages, the word for house developed into a distinct classifier that is different from the drink classifier. In the languages of Southern Vanuatu the word for house iimwa has now turned into a classifier for locations and places, and is distinct from the classifier for drinks — nɨmwɨ.

Now what about the edible canoes that I mentioned earlier? This strange occurrence happens in the language of Sa, also spoken on Pentecost island:

(4a)	a-k			anian		(b)	a-k			bulbul
	FOOD.CLASSIFIER-my	food               	FOOD.CLASSIFIER-my	canoe
	‘my food’					‘my canoe’

Historically, the word for canoe was waga in Proto Oceanic, and the word for bulbul was used for a specific type of canoe. Sometimes linguists get lucky and there can be historical documents that help show us the way. Miss Hardacre, a missionary living in northern Pentecost in the early part of the twentieth century, made a small dictionary of the Raga language. In this dictionary she recorded the generic-specific word pairing waga bulbul, ‘canoe type/raft’. Now in Sa, the original word for canoe, waga, underwent several sound changes until it ended up looking like the food classifier, where only the medial vowel /a/ was left! The new word for canoe was bulbul, whereas the old generic term, waga, merged into the food classifier. In other languages of the area, such as Raljago, spoken on Ambrym, a separate classifier for canoes and boats emerged, distinct from the food classifier. Thus, the food classifier is a, but the canoe classifier is ai.

Sometimes when a merger takes place, the noun that merges into a classifier acts as a Trojan horse. Looking back to the language of North Ambrym, where the drink classifier can occur with other nouns denoting houses, parts of houses, and mats. The word for house that originally merged into the drink classifier acts as a locus for semantic extension, opening a back door to other nouns that are semantically similar — those that are in the domain of houses — to enter into the drink classifier as well.

I think Michael Lotito would have felt at home speaking one of the Oceanic languages of Vanuatu. He might even have said of his Cessna 150ː

(5)	a-k			Cessna 150
	FOOD.CLASSIFIER-my	Cessna 150
	‘my edible Cessna 150’

Many thanks to Andrew Gray who runs the languages of Pentecost Island website and is my co-conspirator in turning this post into a journal article!