The headache-bringer-oner(er) of the English agentive suffix

The headache-bringer-oner(er) of the English agentive suffix

The task of the light-turner-offer-onerer

Recently, a friend jokingly mentioned that he was thinking of hiring a light-turner-offer-onerer so that he wouldn’t have to get off the sofa to operate the light switch. In doing so, he made use of the extremely productive agentive suffix -er (also -or), which we use in English to derive a noun from a verb, to express the person or thing that carries out the action of the verb. The interpretation of this suffix is particularly transparent, even when used in completely novel ways, as in the recent article in The Economist newspaper cleverly titled The Baby Crisperer, drawing an analogy with The Horse Whisperer, while making reference to the gene-editing technology CRISPR-Cas9.

The butcher, the bak-er and the candlestick mak-er

But the striking thing about the opening example is the multiple occurences of the agentive suffix. Most of the time in English the agentive suffix is simply added to the end of a word, regardless of whether the word in question has a single element (e.g. baker) or is a compound word (e.g. candlestick maker). But in the humourous example of the light switch operator, we are faced with a phrasal verb (or rather two phrasal verbs, turn off and turn on with the second instance of turn elided) and, in this case, the agentive suffix is added to each element of the phrasal verb. Omitting any of them (with the exception of the final -er, but we’ll come to that later) feels instinctively wrong (e.g. light-turner-offer-on, light-turner-off-onerer, light-turn-offer-oner, light-turner-off-on, light-turn-off-oner etc.).

So, just what’s going on here? Well, the issue lies in the fact that English phrasal verbs consist of a verb (which by itself has a different meaning) followed by a preposition or adverb, and it is precisely this ordering that appears to trip speakers up. In English, suffixes (by definition) come at the end of a word, but when a word has various elements to it, such as a compound word, there are multiple places that could potentially host a suffix. Since the meaningful element of many English compounds comes at the end (e.g. a houseboat is a type of boat that people live in, while a boathouse is a type of house for boats), it usually goes without saying that the suffix attaches to the final word, but if that ordering is upset in any way we tend to see different forms competing with each other (e.g. mothers-in-law vs. mother-in-laws, directors-general vs. director-generals).

A boathouse (left) and a houseboat (right)

Drawing a parallel with inflectional suffixes, which only affect the verb in a phrasal verb (e.g. wash up > he washes up > he washed up, pass by > she passed by > she’s passing by), we might expect the same to be true when it comes to the agentive suffix -er. Indeed, this is precisely what we see with established forms like passer-by (recorded in the OED as early as 1568). The historical form knocker-up (recorded in the OED from 1861), which referred to a person who would rouse workers by knocking on their window, also followed this pattern; it’s worth noting, however, that the form knocker-upper also exists, as seen in this BBC article about the profession, but it’s unclear whether this is a recent innovation or not. (NB. With the demise of this profession, readers can be excused for interpreting the term knocker-up(per) as a man with a predisposition for getting women pregnant.)

A knocker-up(per) at work

Other terms derived with the -er suffix, however, do not adhere to the pattern of marking only the verb element of a phrasal verb. For instance, we often talk of a property in need of renovation as a fixer-upper. Although we do encounter the forms fixer-up and fix-upper, fixer-upper is by far the most widely used term (recorded in the OED from 1948, and with 41 million Google hits, as opposed to fewer than 180 thousand hits for either fixer-up or fix-upper); no doubt the US reality TV show about home renovations, Fixer Upper, has helped popularise this term, in the US at least.

In many cases, a form which marks both elements of a phrasal verb co-exists with a form which marks only the first element of the phrasal verb, with the former appearing to be a much more recent development. Below are some examples of this (with dates showing the earliest recorded occurrences in the OED):

washer-up (1907)       washer-upper (1961)
picker-up (1611)         picker-upper (1913)
looker-up (1867)         looker-upper (1934)
opter-out (1968)         opter-outer (not recorded)

The form opter-outer was not found in the OED, but is sometimes encountered (a Google search results in around 100 hits), such as in this Telegraph article about opting out of a pension. The opposite term, opter-inner, results in a mere 2 hits, however suprising that might seem following last year’s barrage of GDPR opt-in-related emails that we were all subjected to. (Perhaps this reflects the fact that, in the pre-GDPR world, we tended to opt out of things, rather than the reverse?) One of those hits is this short web article, where the writer is bemoaning the amount of spam emails she receives; in it, she not only uses the forms opted-in and opter-inner – the former illustrating the fact that inflectional suffixes generally only attach to the verbal element of the phrasal verb – but also uses opt-in as a noun, stating that “not all opt-ins are created equal”, where the inflectional suffix is instead on the preposition.

But what’s even more interesting than the -er suffix appearing on both elements of a phrasal verb is that some speakers take this process one step further: once every element has been marked with the -er suffix, it’s as if the word as a whole then needs marking with the suffix again, leading to variants like washer-upperer, doubling up on the suffix on the final element. Based on Google searches, the form with the double suffix is surprisingly less common that I (as a speaker of British English) ever thought it was – washer-upperer returns a mere 244 hits on a Google search, while washer-upper returns 47,500 and washer-up returns 110,000 – although it’s entirely possible that in spoken language forms like this are much more frequent, and the Google search of what people are prepared to commit to writing are skewing the results. In any case, common or otherwise, such forms exist. OK, so no doubt some forms with a double -erer suffix are produced for humourous effect, as our opening example of the light-turner-offer-onerer was, but might there be an explanation for why speakers produce these forms in the first place?

One possible explanation is that speakers add the final -er by analogy with agentive nouns formed from verbs that themselves end in -er and which thereby end in the same -erer sequence, such as gatherer, plasterer, murderer? If this is the case, we might hypothesise that the first -er on the particle serves to make the phrasal verb ‘feel’ more verb-like (from the perspective of the suffix), giving the second -er which performs the agentive function something that it is happy to attach to. Could this possibly explain why Vermont Mountain Real Estate have listed a property on their books as being “a good place to fix upper,” perhaps mistakenly interpreting the -er suffix on the adverb as somehow forming a verb (maybe even a back formation from “fix upperer”)? (A much less interesting explanation, of course, is that this is just a typo.)

This house is a good place to fix upper!

The locus of the plural marker -s in agentive nouns of this sort lends some weight to this idea. In forms that mark only the first element of the phrasal verb, such as passer-by and washer-up, the plural marker almost always attaches to the first element together with the agentive suffix, just as we would expect with inflectional suffixes (recall he washes up, she passed by), so we talk of the passers-by or the washers-up, but are less comfortable with the washer-ups (athough it should come as no surprise by now that both forms are found).

But if both elements of the phrasal verb take the agentive suffix, the plural marker attaches to the rightmost of the two (or more) suffixes. We can no longer say the washers-upper, but have to say the washer-uppers. When both elements take the agentive suffix, speakers appear to reanalyse the word as a single unit which no longer permits suffixes to occur internally (i.e. on a non-final element). And once it’s been reanalysed as a single unit, it almost seems right to then want to attach the -er suffix to the unit as a whole.

So while some may argue that this doubling up of the suffix is done intentionally, as a sort of metalinguistic joke, there are reasons to believe this isn’t always the case and that sometimes such forms (albeit markedly colloquial in nature) are produced because they just feel right and/or are following a rule in a speaker’s internal grammar.

Anyway, thinking about all this has brought on a headache, so I’m off to make myself an automatic day-maker-betterer(er)!

A fun bit of marketing, using the agentive suffix
Adventures in Historical Linguistics

Adventures in Historical Linguistics

While linguistics do not cut the same kind of glamorous profile in fiction as, say, international espionage or organized crime, it does pop up now and again. Even historical linguistics. Having stumbled across a couple older examples recently (thus, historical fictional historical linguistics), I commend them to our readers as an alternative to the cheap thrills that might otherwise tempt them.

Leon Groc’s Le deux mille ans sous la mer (‘2000 years under the sea’), from 1924, starts out with our heroes supervising the construction of a tunnel under the English Channel. They discover a mysterious inscription on a rock face. Fortunately, one of the party is a philologist, and identifies it as Chaldean (i.e. a form of Aramaic)! And a particularly archaic variety at that. This impresses the rest of the party, at least as much as the content of the inscription itself: Impious invaders, you shall not go any further. However, a subsequent mining accident forces them to break through the rock, where they discover a cavern inhabited by race of pale blind people, descendants of Chaldeans (or to be more precise, speakers of Chaldean) who had sought refuge in that cavern from some long-forgotten disaster, only to discover they couldn’t find a way out. The learned philologist applies his practical knowledge of Chaldean in communicating them. I won’t spoil the fun for those of you planning to read it; but it does not go well.

James De Mille’s A Strange Manuscript Found in a Copper Cylinder from 1888 features members of a British expedition surveying the South Pacific becoming stranded in an unknown country with – once again – some cave dwellers, who call themselves Kosekin and speak a Semitic language. In the usual fashion of such stories in this period, there is a narrative within a narrative, in this case the manuscript directly relating the adventure, and the commentary of the members of the yacht party who discovered it. While the core narrator (named More) merely recognizes some affinity to Arabic, one of the members of the yacht party just so happens – once again – to have a philological background, which, after a lengthy digression on the comparative method and Grimm’s law, leads him to conclude that the underground race speaks a language descended from Hebrew:

I can give you word after word that More has mentioned which corresponds to a kindred Hebrew word in accordance with ‘Grimm’s Law.’ For instance, Kosekin ‘Op,’ Hebrew ‘Oph;’ Kosekin ‘Athon,’ Hebrew ‘Adon;’ Kosekin ‘Salon,’ Hebrew ‘Shalom.’ They are more like Hebrew than Arabic, just as Anglo-Saxon words are more like Latin or Greek than Sanscrit.

Further proof of the power of historical linguistics in a tight situation comes from  E. Charles Vivian’s City of Wonder (1923). Again in the South Pacific, a group of adventurers is attacked by a strange woman (speaking, of course, a strange language) in charge of a monkey army. Taking stock after having slaughtered the attackers, the narrator asks one of his companions:

“What is the language she used?” I asked.

“The nearest I can tell you, so far, is that it’s a sort of bastard Persian,” he answered. “It’s a dialect built on a Sanskrit foundation—in my youth I studied Sanskrit, for it’s the key to every Aryan language or dialect in the East, and I always meant to come East. I must stuff you two.”

“Stuff us?” Bent asked.

“Fill you up with words that will be useful—it’s astonishing what you can do in a language if you know three or four hundred words in common use. If you hear it and have to make yourself understood in it, the construction of sentences very soon comes to you. That is, if the language is built on an Aryan foundation, as this is.”

It’s that easy! You just need to learn the method.

Back underground, Howard De Vere’s A Trip to the Center of the Earth, first published in New York Boys’ Weekly in 1878, is a story I haven’t been able to track it down yet, but from the description in E.F. Bleiler’s Science Fiction: The Early Years, it promises to be one of the high points in early dime novel treatments of historical linguistics. A pair of boys exploring Kentucky’s Mammoth Cave come across an underground world where

pallid underground people speak English of a sort, in which inflections have disappeared and certain alterations have taken place.

What could those certain alterations be? As an added bonus, the story is of culinary interest, as the next sentence of Bleiler’s description goes:

Geophagists, they live on a nourishing clay, access to which is sometimes barred by gigantic spiders of extraordinary venomosity.

Alongside lost race fantasies, futuristic science fiction is another obvious vehicle for literary forays into historical linguistics. Régis Messac’s Quinzinzinzili from 1935 is a particularly interesting variant, being – as far as I know – the only serious fictional treatment of contact linguistics. (Admittedly I haven’t looked elsewhere.) Set in the period after a fictional World War II which everybody in this interwar period seemed to be expecting anyway), its narrator is trapped in a post-apocalyptic world alone with a particularly annoying handful of pre-teens. (And thus probably the most gruesome post-apocalyptic story ever written.) They are largely French speakers, but there are Portuguese speakers and English speakers among them as well. They develop a sort of pidginized French, colored by a spontaneous sound changes such as the nasalization of all vowels, along with curious semantic shifts. The title Quinzinzinzili reflects this all, being their rendition of the second clause in the Lord’s Prayer in Latin (qui es in cœlis ‘who art in Heaven’), used as a name for their inchoate deity. I won’t say any more because I think everybody should read it. Way better than Lord of the Flies, which it preceded and superficially resembles. (And which has no noteworthy linguistic content.)

And if anybody knows a good source for back issues of  New York Boys’ Weekly, our lines are open.

A Rainbow of Shared Diversity: Culture and Language in the South Pacific

A Rainbow of Shared Diversity: Culture and Language in the South Pacific

When we think of life in the South Pacific we often imagine relaxing in the shade of a coconut palm listening to the soothing sound of Israel Kamakawiwoʻole’s ‘over the rainbow’ (the official song of this blog post and mandatory listening!). But the South Pacific is in fact culturally diverse, and linguistically too, with around 600 languages in the Oceanic family spread across Micronesia, Melanesia and Polynesia.

The original migration of the Oceanic speaking people started around 1600 BC from the north east of New Guinea and they went on to colonise the uninhabited islands of the Pacific Ocean, with New Zealand being the last country to be inhabited by Polynesian seafarers as late as 1285 CE. The vast distances have created huge cultural differences amongst contemporary Oceanic peoples, yet they all speak languages that stem from Proto-Oceanic – the ancestral language of all of Oceania. For example, the Polynesians are famed for their ability to cross vast swathes of Ocean by using star charts made out of sticks, whereas the Melanesians were not great seafarers. However all Oceanic peoples share similar horticultural practices of cultivating yam and taro root crops, which form the basis of an Oceanic diet.

The enormous cultural diversity amongst the Oceanic speaking people has led to widespread variation in the languages spoken in the South Pacific. In particular we can see the cultural influence on the various languages in how they encode possessive relationships in the language. In the most basic way, an Oceanic language makes a difference in the way it treats alienable and inalienable possessions. We’re not talking UFOs here! Inalienable possessions are those that have an inherent connection with the person to whom they belong – such as body parts or members of the family. Alienable possessions are items that can easily be transferred from one owner to another, such as food, baskets, or other household items.

In Port Sandwich, a language spoken in Vanuatu, possessions that are considered inalienable often have a suffix that encodes the possessor (my, your, his/her) directly attached to the possessed noun

(1)    naru-ngg
‘my son’

Whereas when speaking about sandwiches (and all other alienable possessions) in Port Sandwich, encoding is indirect. The possessor suffix is not able to attach directly to the possessed noun, but instead must attach to a separate marker of possession:

(2)    sanwis        isa-ngg
sandwich        POSS.MARKER-my
‘my sandwich’

Sandwiches aside, in many Oceanic languages this indirect construction that is used for alienable possessions has expanded to include various different semantic types of possession. Languages have separate possessive markers, often called classifiers. Many languages have a three-way split, such as in the language Wuvulu (spoken in the Western Islands off the north coast of Papua New Guinea), for possessions that are eaten, drunk or everything else:

3a. ana-u  niu                      b. numa-mu       upu                         c. ape-muponata
FOOD-my       fish.                DRINK-your  coconut                  GENERAL-your dog
‘my fish (to eat)’                     ‘your coconut (to drink)’             ‘your dog (as a pet)’

Some languages make even more semantic distinctions between alienable items. These classifiers often encode culturally important semantic distinctions. Vera’a, spoken in northern Vanuatu, has eight different possessive classifiers: food, drinks, canoes, houses, beds and mats, prized possessions, long-term possessions, and one for everything else. The Micronesian languages have the largest inventory of classifiers in Oceanic. The Chuuk language has developed thirty-five distinct classifiers, yes, thirty-five! Several of which are used to categorise different types of edible possessions. For example, there is a classifier for cooked food, one for raw food, one for leftover food, and even one that is used with food taken on a journey – great for classifying take-away food!

The yéméti classifier in the Chuuk language for food for a journey is great for take-away pizza, whereas the nikita classifier could be used the day after when you want to eat the leftover pizza – if there is any!

In other languages, speakers are able to create new classifiers when they need to on an ad-hoc basis. This mechanism is particularly prevalent in the languages of Micronesia and New Caledonia. Nêlêmwa, spoken in New Caledonia, can create new classifiers by repeating the possessed noun and adding a suffix to show the possessor, for example mwa ‘house’ (4a) can have the possessor suffix attached (4b), but if a speaker adds an adjective then the possessed noun must be repeated and the directly possessed noun functions as a classifier (4c). In this way a speaker of Nêlêmwa can create new classifiers whenever the need arises.

4a. mwa                        b. mwa-n                    c. mwa-n mwa     doo
house                           house-his                     house-his         house   earth
house                           ‘his house’                    ‘his earth-house’

Though cultural diversity plays a role in the formation of classifiers that are unique to particular languages in the Pacific, there is a commonality among classifiers, and languages that are located far apart often have classifiers that encode similar semantics, which means that though culturally diverse, some important cultural aspects are shared across the Oceanic peoples. For example, many of the Micronesian languages have developed classifiers for beds, mats and pillows. But the language of Vera’a spoken in Northern Vanuatu (over 2500 kilometers away) has also developed a classifier for sleep-related possessions. Similarly, classifiers for domesticated animals have developed in the languages of Micronesia, in Mussau and Seimat (both spoken on the offshore islands of Papua New Guinea), and in Nêlêmwa and Iaai, spoken in New Caledonia. The words used for these classifiers can’t be traced back to a single historical root, which means that these are sporadic innovations in these languages and point to the shared cultural life of the Oceanic peoples.

Just as speakers of different languages can name varying numbers of colours in a rainbow, with Israel Kamakawiwoʻole’s mother tongue Hawaiian distinguishing six colours in contrast to English’s seven, speakers of Oceanic languages differ in the number of ways of categorising their possessions.

A whole nother story

A whole nother story

Words do some truly inventive things when they change, and change they always do. Some switch their sounds around, like when hros became hors, nowadays spelt with an extra e as horse. Some lose their sense of having an internal composition, like when wāl-hros ‘whale-horse’ became walrus. Some cave in to peer pressure and change their looks to conform with others, including one of my favourite cases in English, when under the influence of similarly-meaning words probably, possibly, plausibly which all end in -bly, we get supposably, which is how in some varieties of modern English you can say ‘supposedly’. One the of truly odd things that words do though, is to start stealing sounds from their neighbours.

A famous case in English is an apron, which used to be a napron, until the n got snaffled by the a. It goes the other way too. A newt was originally an ewt. Of course, in Middle English when this n-theivery was underway, there were a few more words complicit in the heist, for example my napron also became mine apron, and your napron became yourn apron, since at that stage in English, words like my/mine, your/yourn worked like a/an. So, ever wondered why the nickname for Edward is Ned? As in mine Ed, ourn Ed? Got it? Speaking of which, nickname was originally ekename and was also involved in a swindling of n from the previous word (the eke-, which is related to eke in ‘eke out a living’, meant an addition or supplement, so mine ekename was my additional name).

It’s not only in English that words have indulged in this shifty business. In late Latin, the word originally borrowed from Greek apotheca would have been l’aboteca, which you may recognise today as Italian la bottega, Spanish la bodega or French and English boutique. In Danish, the plural pronoun meaning ‘you’ is I, related to English ye, but in closely related Swedish it’s ni with an extra n. Where did it get it? Theft. The corresponding plural verbs used to end in -en, like haven i ‘have you?’, and you can see what happened next. In fact, the same game played out a thousand years earlier with singular ‘you’ in several West Germanic languages, except this time it was the verb that kept a piece of the pronoun, when phrases like habēs thū ‘have you?’ became habēst thū, which you might recognise as English havest thou.

How does all this shifting of sounds between words come about? To get an idea, try saying quickly: ‘an apron, a napron, an apron’, and you’ll already have a sense of how this is possible. Unlike on the printed page, words in spoken language stream forth in a smooth and almost seamless flow, and the human brain performs some impressively deft reverse-engineering to slice that stream back up into words. In fact, picking out the individual words in speech is one of the first monumental intellectual tasks we embark on as infants, even before we start learning what the words mean. Recent research suggests that we may even begin this process from within the womb, where we get pre-season access to language courtesy of the muffled rhythms of speech that seep in to us from outside.

Now, you may well wonder how anyone, let alone an infant, can slice up a speech stream into individual words without knowing any of the meanings. Good question. It would appear that the brain operates like a finely tuned statistical inference machine, storing and calculating the relative frequencies at which sounds follow one another, and from this it can begin to pinpoint where the word boundaries are located, since at those boundaries, it is much less predictable what sounds will come next. The trick, then, is that word boundaries are zones of unpredictability, irrespective of their meanings. Of course, we might ask next, why is it that the sounds are so predictable inside the words? One of the reasons for that has to do with what linguists term ‘phonology’: the fascinating way in which sound sequences themselves are intricately structured and highly non-random within the words of human languages, but I’m afraid that for now, that’s a whole nother story.

Double trouble treble

Double trouble treble

You’ll get in trouble if you drink a tripel, the strong pale ale brewed by the most hipster of monks, the Trappists.

The Lowlands are the Hoxton of Europe

Tripels have three times the strength (around 8-10% percent ABV) of the standard table beer historically consumed by the monks themselves. This enkel or ‘single’ beer was traditionally not available outside the cloisters, while the duppel (a double strength dark brown beer made with caramelized beet sugar) was sold to provide income for the monastery. Although the term enkel is no longer in common beer parlance (it is on the cusp of a comeback), duppel and tripel have held their ground. It is generally thought that the tripel takes its name from its threefold strength, but it is also sometimes claimed that it is because it has three times the malt of a regular brew. A quadrupel is VERY strong.

As we have seen already in this blog when counting sheep in Slovenian and yams in Ngkolumbu, means for the expression of quantities and multiplication are often linguistically fascinating. Not least the doublet treble and triple, which originate from the same etymological source.

The Latin word triplus ‘threefold, triple’ first entered English via Old French treble. Not satisfied with claiming the space previously occupied by the Old English adjective þrifeald ‘threefold’, it turned up again by the 15th century as the adjective triple.

This triad of modifiers (threefold, treble and triple) exemplify some of the pathways by which lexical synonymy can come about. The first word was formed through a compounding processes (i.e. the numeral three forming a new word with the multiplicative form –fold), the second entered the language through direct borrowing, and the third through a second wave of borrowing (either from Old French triple or Latin triplus).

We don’t just find words competing to express the same meaning, but also parts of words. The –fold element of threefold, tenfold and manifold, and the –plus of triplus, are argued to have developed from the same Proto Indo-European root *pel ‘to fold’. To complicate things even further, the now obsolete treblefold was attested between the 14th and 16th centuries. Words, it seems, like to fight for the same space, and can sometimes be incestuous.

Since entering English over 500 years ago, triple and treble have staked out different paths, but retained similar meanings in at least some of their manifestations, as explored by Catherine Soanes on the OxfordWords blog. In terms of frequency, triple is the stronger twin (or is it a triplet? quadruplet?), ending up triumphant with around 6 times more occurrences in the Oxford English Corpus.

But treble has some resilience. Although the official Scrabble board has double and triple word scores, treble word scores are occasionally referred to on the net (albeit erroneously, or in a devil-may-care way), such as in Charlie Brooker’s article on how to cheat at scrabble. I even found a ‘threefold word score’ on a Scrabble knock-off site. Lawyers to the ready!

This demonstrates that these adjectives really are semantically interchangeable for the most part, even though their distributions are not identical.

The take home? While not not every monastery sells the same tripel, they will all get you drunk.



Hallowe’en will soon be upon us, so it is only right we turn our attention to monsters. Consider the werewolf. It’s a wolf, sort of, as the name indicates, but what’s a were? The usual assumption is that it’s a leftover of an older word meaning ‘man’ that fell completely out of fashion by the 14th century. As a result we have what looks like a compound word, except that one of the parts doesn’t have any meaning on its own. Perhaps not, but that hasn’t stopped people from squeezing some value out of it nonetheless: if a werewolf is a person who turns into a wolf — or at any rate, part person, part wolf — then a were-bear is a mixture of person and bear, and so on down to were-turtles.

Actually, people don’t seem to be that literal-minded when it comes to word meanings, if the various were-creatures in circulation are any evidence. The monster from “Wallace and Gromit: Curse of the Were-Rabbit” is not half-human, half-rabbit, but more just kind of a monster rabbit, with a thicker pelt. (Visually calqued, I suspect, from the not-particularly wolf-like wolfman of the wolfman movies featuring Lon Chaney Jr.)

And were-fleas, to the extent that they exist, appear to be carriers of lycanthropism rather than human/insect conglomerates. None of this is yet reflected in the Oxford English Dictionary’s entry on were– (you need a subscription for that but it’s free if you have a UK public library card!). Give it a few decades more maybe.

Strangely, words for werewolf in other languages share a propensity for being compounds made up of ‘wolf’ plus some other completely opaque element. The first part of Czech vlkodlak is vlk, which means ‘wolf‘, but dlak on its own is not an independent word. (Not in Czech at any rate, but in the related language Slovenian the equivalent word volkodlak is clearly made up of volk ‘wolf’ and dlaka, which means ‘hair’ or ‘fur’.) And the French werewolf, loup-garou, has the word for ‘wolf’ in it (loup), but garou is not an independent word (other than being an unrelated homonym meaning ‘flax-leaved daphne’). That part seems to have been our very own Germanic word werewolf borrowed at an early date (earliest attestation as garwall from the 12th century). Both of these have, like werewolf, given rise to further monstrous hybrids like Czech prasodlak, from prase ‘pig’, or the French cochon-garou.

In fact, Czech and French have gone one step further than English. Though I just wrote that dlak and garou were not words, that was being a bit pedantic. Neither of them are listed in the authoritative Academy dictionaries of Czech and French, but nonetheless they do seem to have split off from their host body, rather as happened — if we can be permitted to mix monster metaphors — to the hero of 1959’s “The Manster (a.k.a The Split)”.

For example, this Czech website tells us about vlkodlaci i jiní dlaci ‘werewolves and other were-creatures’ (dlaci is the plural of dlak), and in French the phrase courir le garou ‘run the garou‘ used, at least, to be in circulation, meaning basically ‘go around at night being a werewolf”. That use in turn apparently spawned a verb garouter, meaning much the same thing. The curse lives on.

Optimal Categorisation: How do we categorise the world around us?

Optimal Categorisation: How do we categorise the world around us?

People love to categorise! We do this on a daily basis, consciously and subconsciously. When we are confronted with something new we try and figure out what it is by comparing it to something we already know. Say, for instance, I saw something flying through the air – I may think to myself that the object is a bird, or I may say it is a plane based on my previous experiences of birds and planes. Of course the object may turn out to be something completely new, perhaps even superman!

Is it a bird? Is it a plane? No it’s Superman!

Our love of classification runs deep in scientific enquiry. Botanists and zoologists classify plants and animals into different taxonomies. Even the humble linguist loves to classify – is this new word a noun or a verb? What about the new word zoodle that was recently added to the Merrriam Webster dctionary? Is it a thing? Or an action? Can I zoodle something or is it something I can pick up and touch? Well apparently zoodle is a noun which means ‘a long, thin strip of zucchini that resembles a string or narrow ribbon of pasta’. To be honest, I love eating zoodles, though until now I never knew what they were called!

The way people classify entities around them has become encoded in the different languages we speak in many different ways. The most obvious example that springs to mind is when we learn a new language, like French or German, we are confronted with a grammatical gender system. French has two genders – Masculine and Feminine. But German has three – Masculine, Feminine and Neuter. Other languages can have many more gender distinctions. Fula, a language spoken in west and central Africa, has twenty different gender categories!

So what exactly are grammatical gender systems and how are they realised in different languages? Gender systems categorise nouns into different groups and tend to appear not on the noun itself, but on other elements in the phrase. In German, nouns are split into three different gender categories – masculine, feminine and neuter. The gender of a noun is shown by using different articles (the word ‘the’ or ‘a’) and sometimes by changing the ending of an adjective, but never on the noun itself. Thus the word for ‘the’ in German is either der, die or das depending on whether the noun in the phrase is masculine, feminine or neuter.

(1)        der       Mann
              the       man

(2)        die        Frau
              the       woman

(3)        das       Haus
              the       house

This is called ‘agreement’ as the adjectives and articles must agree with the gender of the noun. In a language with gender, each noun typically can only occur in one gender category.

Not every language has a grammatical gender system, but they are highly pervasive, with around 40% of all languages having such a system. English is quite a poor example when it comes to gender. There is no real gender agreement in English, with the exception of pronouns. We have to say: Bill walked into the grocers. He bought some apples. Where the pronoun he must agree with the gender of the noun that was previously mentioned. English uses he, she and it as the only markers of gender agreement.

Languages behave differently in how they allocate nouns to the different genders, which can be very baffling for language learners! Why in French is chair feminine, la chaise, but in German it is masculine, der Stuhl? How a language allocates nouns to its gender categories can seem somewhat arbitrary – with the exception of the words for women and men, which fall into the feminine and masculine genders being the only semantically obvious choices.

But wait! If you thought the English gender system was dull, think again! A couple of months ago my piano was being restored and when it was being moved back into the lounge the piano movers kept saying: “pull her a little bit more” and “turn her this way”. The movers used the female pronouns to describe the piano. In English, countries, pianos, ships and sometimes even cars use the feminine pronouns.

Grammatical gender isn’t the only way languages classify nouns. Some languages use words called classifiers to categorise nouns. Classifiers are similar to English measure terms, which categorise the noun in terms of its quantity, such as ‘sheet of paper’ vs. ‘pack of paper’ or ‘slice of bread vs. ‘loaf of bread’. Classifiers are found in languages all over the world and are able to categorise nouns depending on the shape, size, quantity or use of the referent, e.g. ‘animal kangaroo’ (alive) vs. ‘meat kangaroo’ (not alive). Classifier systems are very different to gender systems as nouns in a language with classifiers can appear with different classifiers depending on what property of the noun you wish to highlight. There are many different types of classifier systems, but to keep things short I am just going talk about possessive classifiers, which are mainly found in the Oceanic languages, spoken in the South Pacific.

When an item is in your possession we use possessive pronouns in English to say who the item belongs to. For instance if I say ‘my coconut’ – the possessive pronoun is my. In many Oceanic languages a noun can occur with different forms for the word my depending on how the owner intends to use it. For instance the Paamese language, spoken in Vanuatu, has four possessive classifiers and I could use the ‘drinkable’ if I was talking about my coconut that I was going to drink. I would use the ‘edible’ classifier if I was going to eat my coconut. I would use the classifier for ‘land’ if I was talking about the coconut growing in my garden. Finally, I could use the ‘manipulative’ classifier if I was going to use my coconut for some other purpose – perhaps to sit on!

(4)        ani                   mak
              coconut           my.drinkable
              ‘my coconut (that I will drink)’

(5)        ani                   ak
              coconut           my.edible
              ‘my coconut (that I will eat)’

Why do languages have different ways of categorising nouns? How do these systems develop and change over time? Are gender systems easier to learn than classifier systems? Are gender and classifiers completely different systems? Or is there more similarity to them than meets the eye? These are some of the big questions in linguistics and psychology. We are excited to start a new research project at the Surrey Morphology Group, called optimal categorisation: the origin and nature of gender from a psycholinguistic perspective, that seeks to answer these fundamental questions. Over the next three years we will talk more about these fascinating categorisation systems, explain our experimental research methods, introduce the languages and speakers under investigation, and share our findings via this blog. Just look out for the ‘Optimal Categorisation’ headings!

The cat’s mneow: animal noises and human language

The cat’s mneow: animal noises and human language

As is well known, animals on the internet can have very impressive language skills: cats and dogs in particular are famous for their near-complete online mastery of English, and only highly trained professional linguists (including some of us here at SMG) are able to spot the subtle grammatical and orthographic clues that indicate non-human authorship behind some of the world’s favourite motivational statements.

Recent reports suggest that some of our fellow primates have also learnt to engage in complex discourse: again, the internet offers compelling evidence for this.

But sadly, out in the real world, animals capable of orating on philosophy are hard to come by (as far as we can tell). Instead, from a human point of view, cats, dogs, gorillas etc. just make various kinds of animal noises.

Why write about animals and their noises on a linguistics blog? Well, one good answer would be: the exact relationship between the vocalisations made by animals, on one hand, and the phenomenon of human spoken language, on the other, is a fascinating question, of interest within linguistics but far beyond it as well. So a different blog post could have turned now to discuss the semiotic notion of communication in the abstract; or perhaps the biological evolution of language in our species, complete with details about the FOXP2 gene and the descent of the larynx

But in fact I am going to talk about something a lot less technical-sounding. This post is about what could be called the human versions of animal noises: that is, the noises that English and other languages use in order to talk about them, like meow and woof, baa and moo.

At this point you may be wondering whether there is much to be gained by sitting around and pondering words like moo. But what I have in mind here is this kind of thing:

These are good fun, but they also raise a question. If pigs and ducks are wandering around all over the world making pig and duck noises respectively, then how come we humans appear to have such different ideas about what they sound like? Oink cannot really be mistaken for nöff or knor, let alone buu. And the problem is bigger than that: even within a single language, English, frogs can go both croak and ribbit; dogs don’t just go woof, but they also yap and bark. These sound nothing like each other. What is going on? Are we trying to do impressions of animals, only to discover that we are not very good at it?

Before going any further I should deal with a couple of red herrings (to stick with the zoological theme). For one thing, languages may appear to disagree more than they really do, just because their speakers have settled on different spelling conventions: a French coin doesn’t really sound all that different from an English quack. And sometimes we may not all be talking about the same sound in the first place. Ribbit is a good depiction of the noise a frog makes if it happens to belong to a particular species found in Southern California – but thanks to the cultural influence of Hollywood, ribbit is familiar to English speakers worldwide, even though their own local frogs may sound a lot more croaky. Meanwhile, it is easy to picture the difference between the kind of dog that goes woof and the kind that goes yap.

But even when we discount this kind of thing, there are still plenty of disagreements remaining, and they pose a puzzle bound up with linguistics. A fundamental feature of human language, famously pointed out by Saussure, is that most words are arbitrary: they have nothing inherently in common with the things they refer to. For example, there is nothing actually green about the sound of the word green – English has just assigned that particular sound sequence to that meaning, and it’s no surprise to find that other languages haven’t chosen the same sounds to do the same job. But right now we are in the broad realm of onomatopoeia, where you might not expect to find arbitrariness like this. After all, unlike the concept of ‘green’, the concept of ‘quack’ is linked to a real noise that can be heard out there in the world: why would languages bother to disagree about it?


First off, it is worth noticing that not all words relating to animal noises work in the same way. Think of cock-a-doodle-doo and crow. Both of these are used in English of the distinctive sound made by a cockerel, and there is something imitative about them both. But there is a difference between them: the first is used to represent the sound itself, whereas the second is the word that English uses to talk about producing it. That is, as English sees it, the way a cock crows is by ‘saying’ cock-a-doodle-doo, and never vice versa. Similarly, the way that a dog barks is by ‘saying’ woof. The representations of the sounds, cock-a-doodle-doo and woof, are practically in quotation marks, as if capturing the animals’ direct speech.

This gives us something to run with. After all, think about the work that words like crow and bark have to do. As they are verbs, you need to be able to change them according to person (they bark but it barks), tense, and so on. So regardless of their special function of talking about noises, they still have to operate like any other verb, obeying the normal grammar rules of English. Since every language comes with its own grammatical requirements and preferences about how words can be structured and manipulated (that is, its own morphology), this can explain some kinds of disparity across languages. For example, what we onomatopoeically call a cuckoo is a kukushka in Russian, featuring a noun-forming element shka which makes the word easier to deal with grammatically – but also makes it sound very Russian. Maybe it is this kind of integration into each language that makes these words sound less true to life and more varied from one language to another?

This is a start, but it must be far from the whole story. Animal ‘quotes’ like woof and cock-a-doodle-doo don’t need to interact all that much with English grammar at all. Nonetheless, they are clearly the English versions of the noises we are talking about:

And as we’ve already seen, the same goes for quack and oink. So even when it looks like we might just be ‘doing impressions’ of non-linguistic sounds, every language has its own way of actually doing those impressions.

Reassuringly, at least we are not dealing with a situation of total chaos. Across languages, duck noises reliably contain an open a sound, while pig noises reliably don’t. And there is widespread agreement when it comes to some animals: cows always go moo, boo or similar, and sheep are always represented as producing something like meh or beh – this is so predictable that it has even been used as evidence for how certain letters were pronounced in Ancient Greek. So languages are not going out of their way to disagree with each other. But this just sharpens up the question. For obvious biological reasons, humans can never really make all the noises that animals can. But given that people the world over sometimes converge on a more or less uniform representation for a given noise, why doesn’t this always happen?

In their feline wisdom, the cats of the Czech Republic can give us a clue. Like sheep, cats sound pretty similar in languages across the globe, and in Europe they are especially consistent. In English, they go meow; in German, it is miau; in Russian, myau; and so on. But in Czech, they go mňau (= approximately mnyau), with a mysterious n-sound inside. The reason is that at some point in the history of Czech, a change in pronunciation affected every word containing a sequence my, so that it came out as mny instead. Effectively, for Czech speakers from then on, the option of saying myau like everyone else was simply off the table, because the language no longer allowed it – no matter what their cats sounded like.

What does this example illustrate? First of all – as well as a morphology, each language has a phonology (sound structure), which constrains its speakers tightly: no language lets people use all the sounds they are physically able to make, and even the available sounds are only allowed to join up in certain combinations. So each language has to come up with a way of dealing with non-linguistic noises which will suit its own idea of what counts as a legitimate syllable. Moo is one thing, but it’s harder to find a language that allows syllables resembling the noise a pig makes… so each language compromises in its own way, resulting in nöff, knor, oink etc., none of which capture the full sonic experience of the real thing.

And second – things like oink, woof and mňau really must be words in the full sense. They aren’t just a kind of quotation, or an imitation performed off the cuff; instead they belong in a speaker’s mental dictionary of their own language. That is why, in general, they have to abide by the same phonological rules as any other word. And that also explains where the arbitrariness comes in: as with any word, language learners just notice that that is the way their own community expresses a shared concept, and from then on there is no point in reinventing the wheel. You don’t need to try hard to get a duck’s quack exactly right in order to talk about it – as long as other people know what you mean, the word has done its job.

So what speakers might lose in accuracy this way, they make up for in efficiency, by picking a predetermined word that they know fellow speakers will recognise. Only when you really want to draw attention to a sound is it worth coming up with a new representation of it and ignoring the existing consensus. To create something truly striking, perhaps you need to be a visionary like James Joyce, who wrote the following line of ‘dialogue’ for a cat in Ulysses, giving short shrift to English phonology in the process:



What’s the good of ‘would of’?

What’s the good of ‘would of’?

As schoolteachers the English-speaking world over know well, the use of of instead of have after modal verbs like would, should and must is a very common feature in the writing of children (and many adults). Some take this an omen of the demise of the English language,  and would perhaps agree with Fowler’s colourful assertion in A Dictionary of Modern English Usage (1926) that “of shares with another word of the same length, as, the evil glory of being accessory to more crimes against grammar than any other” (though admittedly this use of of has been hanging around for a while without doing any apparent harm: this study finds one example as early as 1773, and another almost half a century later in a letter of the poet Keats).

According to the usual explanation, this is nothing more than a spelling mistake. Following ‘would’, ‘could’ etc., the verb have is usually pronounced in a reduced form as [əv], usually spelt would’ve, must’ve, and so on. It can even be reduced further to [ə], as in shoulda, woulda, coulda. This kind of phonetic reduction is a normal part of grammaticalisation, the process by which grammatical markers evolve out of full words. Given the famous unreliability of English spelling, and the fact that these reduced forms of have sound identical to reduced forms of the preposition of (as in a cuppa tea), writers can be forgiven for mistakenly inferring the following rule:

‘what you hear/say as [əv] or [ə], write as of’.

But if it’s just a spelling mistake, this use of ‘of’ is surprisingly common in respectable literature. The examples below (from this blog post documenting the phenomenon) are typical:

‘If I hadn’t of got my tubes tied, it could of been me, say I was ten years younger.’ (Margaret Atwood, The Handmaid’s Tale)

Couldn’t you of – oh, he was ignorant in his speech – couldn’t you of prevented it?’ (Hilary Mantel, Beyond Black)

Clearly neither these authors nor their editors make careless errors. They consciously use ‘of’ instead of ‘have’ in these examples for stylistic effect. This is typically found in dialogue to imply something about the speaker, be it positive (i.e. they’re authentic and unpretentious) or negative (they are illiterate or unsophisticated).


These examples look like ‘eye dialect’: the use of nonstandard spellings that correspond to a standard pronunciation, and so seem ‘dialecty’ to the eye but not the ear. This is often seen in news headlines, like the Sun newspaper’s famous proclamation “it’s the Sun wot won it!” announcing the surprise victory of the conservatives in the 1992 general election. But what about sentences like the following from the British National Corpus?

“If we’d of accepted it would of meant we would have to of sold every stick of furniture because the rooms were not large enough”

The BNC is intended as a neutral record of the English language in the late 20th century, containing 100 million words of carefully transcribed and spellchecked text. As such, we expect it to have minimal errors, and there is certainly no reason it should contain eye dialect. As Geoffrey Sampson explains in this article:

“I had taken the of spelling to represent a simple orthographic confusion… I took this to imply that cases like could of should be corrected to could’ve; but two researchers with whom I discussed the issue on separate occasions felt that this was inappropriate – one, with a language-teaching background, protested vigorously that could of should be retained because, for the speakers, the word ‘really is’ of rather than have.”

In other words, some speakers have not just reinterpreted the rules of English spelling, but the rules of English grammar itself. As a result, they understand expressions like should’ve been and must’ve gone as instances of a construction containing the preposition of instead of the verb have:

Modal verb (e.g. must, would…) + of + past participle (e.g. had, been, driven…)

One way of testing this theory is to look at pronunciation. Of can receive a full pronunciation [ɒv] (with the same vowel as in hot) when it occurs at the end of a sentence, for example ‘what are you dreaming of?’. So if the word ‘really is’ of for some speakers, we ought to hear [ɒv] in utterances where of/have appears at the end, such as the sentence below. To my mind’s ear, this pronunciation sounds okay, and I think I even use it sometimes (although intuition isn’t always a reliable guide to your own speech).

I didn’t think I left the door open, but I must of.

The examples below from the Audio BNC, both from the same speaker, are transcribed as of but clearly pronounced as [ə] or [əv]. In the second example, of appears to be at the end of the utterance, where we might expect to hear [ɒv], although the amount of background noise makes it hard to tell for sure.

 “Should of done it last night when it was empty then” (audio) (pronounced [ə], i.e. shoulda)

(phone rings) “Should of.” (audio) (pronounced [əv], i.e. should’ve)

When carefully interpreted, writing can also be a source of clues on how speakers make sense of their language. If writing have as of is just a linguistically meaningless spelling mistake, why do we never see spellings like pint’ve beer or a man’ve his word? (Though we do, occasionally, see sort’ve or kind’ve). This otherwise puzzling asymmetry is explained if the spelling of in should of etc. is supported by a genuine linguistic change, at least for some speakers. Furthermore, have only gets spelt of when it follows a modal verb, but never in sentences like the dogs have been fed, although the pronunciation [əv] is just as acceptable here as in the dogs must have been fed (and in both cases have can be written ‘ve).

If this nonstandard spelling reflects a real linguistic variant (as this paper argues), this is quite a departure from the usual role of a preposition like of, which is typically followed by a noun rather than a verb. The preposition to is a partial exception, because while it is followed by a noun in sentences like we went to the party, it can also be followed by a verb in sentences like we like to party. But with to, the verb must appear in its basic infinitive form (party) rather than the past participle (we must’ve partied too hard), making it a bit different from modal of, if such a thing exists.

She must’ve partied too hard

Whether or not we’re convinced by the modal-of theory, it’s remarkable how often we make idiosyncratic analyses of the language we hear spoken around us. Sometimes these are corrected by exposure to the written language: I remember as a young child having my spelling corrected from storbry to strawberry, which led to a small epiphany for me, as that was the first time I realised the word had anything to do with either straw or berry. But many more examples slip under the radar. When these new analyses lead to permanent changes in spelling or pronunciation we sometimes call them folk etymology, as when the Spanish word cucaracha was misheard by English speakers as containing the words cock and roach, and became cockroach (you can read more about folk etymology in earlier posts by Briana and Matthew).

Meanwhile, if any readers can find clear evidence of modal of with the full pronunciation as  [ɒv], please comment below! I’m quite sure I’ve heard it, but solid evidence has proven surprisingly elusive…

No we [kæn]

No we [kæn]

If something bad happened to someone you hold in contempt, would you give a fig, a shit or a flying f**k? While figs might be a luxury food item in Britain, their historical status as something that is valueless or contemptible puts them on the same level as crap, iotas and rats’ asses for the purposes of caring.

In English, we have a wide range of tools for expressing apathy. But we don’t always agree on how to express it, and even use seemingly opposite affirmative and negative sentences to express very similar concepts.  Consider the confusing distinction between ‘I couldn’t care less’ vs. ‘I could care less’ which are used in identical contexts by British and American speakers of English to mean pretty much the same thing. This mind-boggling pattern makes sense when we realise that those cold-hearted people who couldn’t care less have a care-factor of zero, while the others don’t care much, but could do so even less, if necessary.

Putting aside such oddities, negation is normally crucial to interpreting a sentence – words like ‘not’ determine whether the rest of the sentence is affirmative or negative (i.e. whether you’re claiming it is true or false). Accordingly, languages tend to mark negation clearly, sometimes in more than once place within a sentence. One of the world’s most robust languages in this respect is Bierebo, an Austronesian language spoken in Vanuatu, where no less than three words for expressing negation are required at once (Budd 2010: 518):

Mara   a-sa-yal              re         manu  dupwa  pwel.
NEGl   3PL.S-eat-find   NEG2  bird     ANA      NEG3
‘They didn’t get to eat the bird.’

While marking negation three times might seem a little inefficient, this pales in comparison to the problems that arise when you don’t clearly indicate it all. We only have to turn to English to see this at work, where the distinction between Received Pronunciation can [kæn] and can’t [kɑ:nt] is frequently imperceptible in American varieties where final /t/ is not released, resulting in [kæn] or [kən] in both affirmative and negative contexts.

You might think that once a word or affix or sound that indicates negation has been removed from a word, there isn’t anywhere else to go. But some Dravidian languages spoken in India really push the boat out in this respect. Instead of adding some sort of negative word or affix to an affirmative sentence to signal negation, the tense affix (past –tt or future -pp) is taken away, as shown by the contrast between literary Tamil affirmatives and negatives.

pati-tt-ēn                    pati-pp-ēn                  patiy-ēn
‘I learned’                  ‘I will learn.’               ‘I do/did/will not learn.’

This is highly unusual from a linguistic point of view, and it’s tempting to think that languages avoid this type of negation because it is difficult to learn or doesn’t make sense design-wise. But historical records show similar patterns have been attested across Dravidian languages for centuries. This demonstrates that inflection patterns of this kind can be highly sustainable when they come about – so we might be stuck with the can/can’t collapse for a while to come.