Morphological Redundancy – Why say something twice when once will do?

Morphological Redundancy – Why say something twice when once will do?

In Batsbi (a language spoken in the Caucusus in North-East Georgia), if you want to say ‘she is ripping the dress’ you might say something like yoxyoyanw k’ab. In this word, each instance of ‘y’ (highlighted in bold) indicates that it is indeed just one dress that she is ripping.

Linguists call this phenomenon multiple exponence, where a single meaning is indicated within a word more than once, for no apparent reason. This, when you think about it, is pretty weird. Typically we think of languages as incremental in nature: intuitively, we assume that when we add something to a word or a sentence we are adding meaning to that word or sentence. But in multiple exponence this clearly can’t be the case. The dress in the Batsbi example is no more singular than any other singular object in the world, so why have three ‘y’s’ rather than just the one we would expect?

In other words, why say something twice when once will do? The short answer is we don’t know (yet!) – sorry to disappoint! But what I can answer is a slightly different question: what does it actually mean to say something twice?

Multiple exponence is not the only way you might say something twice within a word. There is another phenomenon known as overlapping exponence, where the same meaning is indicated by multiple markers in a word (as with multiple exponence), but each marker is also doing some other job. For example, in Filomeno Mata Totonco (a language from Mexico) you say ‘you are coming’ using the word tanpaati. This word has two suffixes, paa and ti, both of which mean ‘you’ (second person). However, the paa also indicates that the event is progressive (like the English –ing), while the other suffix ti indicates that the subject is singular rather than plural. So speakers of this language mention that it’s you who is coming twice, but we couldn’t remove either of the suffixes from the word without affecting the meaning, as both of them also tell us something else about what’s going on.

In Wipi, a language spoken in the Fly River Delta on the south coast of Papua New Guinea, if you want to say that you are building two houses you would use the word arangen which literally means ‘I build two’. This word is rather interesting since you need both the prefix, a, and the suffix, en, to know that this is indeed only two houses as opposed some other number of houses. Yet neither of these affixes actually means ‘two.’ Instead, the suffix en is ambiguous between one or two; we might say it means less than three. The prefix a, in contrast, is used when you are building two or more houses; in other words, it means more than one. Thus, if you are building more than one house but also less than three, there is only one interpretation: you are building two houses. This is called distributed exponence. It’s remarkable that speakers of Wipi say how many houses they are building twice, but in order to know the exact number of houses, you need to listen both times!

The Fly River Delta

It’s amazing really, when you look closely at a simple question like what does it mean to say something twice?, that there is such complexity and diversity in the answer. Beyond what we saw, there are all sorts of in-between cases and the multiple types can interact. As such, teasing them apart can be a real challenge. When I say something twice, it might be that each time gives you more information in subtly different ways. It is untying this kind of subtle diversity which hopefully gives us some hint as to why speakers and languages would ever do such a thing to begin with.

Sense and polarity, or why meaning can drive language change

Sense and polarity, or why meaning can drive language change

Generally a sentence can be negative or positive depending on what one actually wants to express. Thus if I’m asked whether I think that John’s new hobby – say climbing – is a good idea, I can say It’s not a good idea; conversely, if I do think it is a good idea, I can remove the negation not to make the sentence positive and say It’s a good idea. Both sentences are perfectly acceptable in this context.

From such an example, we might therefore conclude that any sentence can be made positive by removing the relevant negative word – most often not – from the sentence. But if that is the case, why is the non-negative response I like it one bit not acceptable, odd when its negative counterpart I don’t like it one bit is perfectly acceptable and natural?

This contrast has to do with the expression one bit: notice that if it is removed, then both negative and positive responses are perfectly fine: I could respond I don’t like it or, if I do like it, I (do) like it.

It seems that there is something special about the phrase one bit: it wants to be in a negative sentence. But why? It turns out that this question is a very big puzzle, not only for English grammar but for the grammar of most (all?) languages. For instance in French, the expression bouger/lever le petit doigt `lift a finger’ must appear in a negative sentence. Thus if I know that John wanted to help with your house move and I ask you how it went, you could say Il n’a pas levé le petit doigt `lit. He didn’t lift the small finger’ if he didn’t help at all, but I could not say Il a levé le petit doigt lit. ‘He lifted the small finger’ even if he did help to some extent.

Expressions like lever le petit doigt `lift a finger’, one bit, care/give a damn, own a red cent are said to be polarity sensitive: they only really make sense if used in negative sentences. But this in itself is not the most interesting property.

What is much more interesting is why they have this property. There is a lot of research on this question in theoretical linguistics. The proposals are quite technical but they all start from the observation that most expressions that need to be in a negative context to be acceptable are expressions of minimal degrees and measures. For instance, a finger or le petit doigt `the small finger’ is the smallest body part one can lift to do something, a drop (in the expression I didn’t drink a drop of vodka yesterday) is the smallest observable quantity of vodka, etc.

Regine Eckardt, who has worked on this topic, formulates the following intuition: ‘speakers know that in the context of drinking, an event of drinking a drop can never occur on its own – even though a lot of drops usually will be consumed after a drinking of some larger quantity.’ (Eckardt 2006, p. 158). However the intuition goes, the occurrence of this expression in a negative sentence is acceptable because it denies the existence of events that consist of just drinking one drop.

What this means is that if Mary drank a small glass of vodka yesterday, although it is technically true to say She drank a drop of vodka (since the glass contains many drops) it would not be very informative, certainly not as informative as saying the equally true She drank a glass of vodka.

However imagine now that Mary didn’t drink any alcohol at all yesterday. In this context, I would be telling the truth if I said either one of the following sentences: Mary didn’t drink a glass of vodka or Mary didn’t drink a drop of vodka. But now it is much more informative to say the latter. To see this consider the following: saying Mary didn’t drink a glass of vodka could describe a situation in which Mary didn’t drink a glass of vodka yesterday but she still drank some vodka, maybe just a spoonful. If however I say Mary didn’t drink a drop of vodka then this can only describe a situation where Mary didn’t drink a glass or even a little bit of vodka. In other words, saying Mary didn’t drink a drop of vodka yesterday is more informative than saying Mary didn’t drink a glass of vodka yesterday because the former sentence describes a very precise situation whereas the latter is a lot less specific as to what it describes (i.e. it could be uttered in a situation in which Mary drank a spoonful of vodka or maybe a cocktail that contains 2ml of vodka, etc)

By using expressions of minimal degrees/measures in negative environments, the sentences become a lot more informative. This, it seems, is part of the reason why languages like English have changed such that these words are now only usable in negative sentences.

Double trouble treble

Double trouble treble

You’ll get in trouble if you drink a tripel, the strong pale ale brewed by the most hipster of monks, the Trappists.

The Lowlands are the Hoxton of Europe

Tripels have three times the strength (around 8-10% percent ABV) of the standard table beer historically consumed by the monks themselves. This enkel or ‘single’ beer was traditionally not available outside the cloisters, while the duppel (a double strength dark brown beer made with caramelized beet sugar) was sold to provide income for the monastery. Although the term enkel is no longer in common beer parlance (it is on the cusp of a comeback), duppel and tripel have held their ground. It is generally thought that the tripel takes its name from its threefold strength, but it is also sometimes claimed that it is because it has three times the malt of a regular brew. A quadrupel is VERY strong.

As we have seen already in this blog when counting sheep in Slovenian and yams in Ngkolumbu, means for the expression of quantities and multiplication are often linguistically fascinating. Not least the doublet treble and triple, which originate from the same etymological source.

The Latin word triplus ‘threefold, triple’ first entered English via Old French treble. Not satisfied with claiming the space previously occupied by the Old English adjective þrifeald ‘threefold’, it turned up again by the 15th century as the adjective triple.

This triad of modifiers (threefold, treble and triple) exemplify some of the pathways by which lexical synonymy can come about. The first word was formed through a compounding processes (i.e. the numeral three forming a new word with the multiplicative form –fold), the second entered the language through direct borrowing, and the third through a second wave of borrowing (either from Old French triple or Latin triplus).

We don’t just find words competing to express the same meaning, but also parts of words. The –fold element of threefold, tenfold and manifold, and the –plus of triplus, are argued to have developed from the same Proto Indo-European root *pel ‘to fold’. To complicate things even further, the now obsolete treblefold was attested between the 14th and 16th centuries. Words, it seems, like to fight for the same space, and can sometimes be incestuous.

Since entering English over 500 years ago, triple and treble have staked out different paths, but retained similar meanings in at least some of their manifestations, as explored by Catherine Soanes on the OxfordWords blog. In terms of frequency, triple is the stronger twin (or is it a triplet? quadruplet?), ending up triumphant with around 6 times more occurrences in the Oxford English Corpus.

But treble has some resilience. Although the official Scrabble board has double and triple word scores, treble word scores are occasionally referred to on the net (albeit erroneously, or in a devil-may-care way), such as in Charlie Brooker’s article on how to cheat at scrabble. I even found a ‘threefold word score’ on a Scrabble knock-off site. Lawyers to the ready!

This demonstrates that these adjectives really are semantically interchangeable for the most part, even though their distributions are not identical.

The take home? While not not every monastery sells the same tripel, they will all get you drunk.

Werewolves

Werewolves

Hallowe’en will soon be upon us, so it is only right we turn our attention to monsters. Consider the werewolf. It’s a wolf, sort of, as the name indicates, but what’s a were? The usual assumption is that it’s a leftover of an older word meaning ‘man’ that fell completely out of fashion by the 14th century. As a result we have what looks like a compound word, except that one of the parts doesn’t have any meaning on its own. Perhaps not, but that hasn’t stopped people from squeezing some value out of it nonetheless: if a werewolf is a person who turns into a wolf — or at any rate, part person, part wolf — then a were-bear is a mixture of person and bear, and so on down to were-turtles.

Actually, people don’t seem to be that literal-minded when it comes to word meanings, if the various were-creatures in circulation are any evidence. The monster from “Wallace and Gromit: Curse of the Were-Rabbit” is not half-human, half-rabbit, but more just kind of a monster rabbit, with a thicker pelt. (Visually calqued, I suspect, from the not-particularly wolf-like wolfman of the wolfman movies featuring Lon Chaney Jr.)

And were-fleas, to the extent that they exist, appear to be carriers of lycanthropism rather than human/insect conglomerates. None of this is yet reflected in the Oxford English Dictionary’s entry on were– (you need a subscription for that but it’s free if you have a UK public library card!). Give it a few decades more maybe.

Strangely, words for werewolf in other languages share a propensity for being compounds made up of ‘wolf’ plus some other completely opaque element. The first part of Czech vlkodlak is vlk, which means ‘wolf‘, but dlak on its own is not an independent word. (Not in Czech at any rate, but in the related language Slovenian the equivalent word volkodlak is clearly made up of volk ‘wolf’ and dlaka, which means ‘hair’ or ‘fur’.) And the French werewolf, loup-garou, has the word for ‘wolf’ in it (loup), but garou is not an independent word (other than being an unrelated homonym meaning ‘flax-leaved daphne’). That part seems to have been our very own Germanic word werewolf borrowed at an early date (earliest attestation as garwall from the 12th century). Both of these have, like werewolf, given rise to further monstrous hybrids like Czech prasodlak, from prase ‘pig’, or the French cochon-garou.

In fact, Czech and French have gone one step further than English. Though I just wrote that dlak and garou were not words, that was being a bit pedantic. Neither of them are listed in the authoritative Academy dictionaries of Czech and French, but nonetheless they do seem to have split off from their host body, rather as happened — if we can be permitted to mix monster metaphors — to the hero of 1959’s “The Manster (a.k.a The Split)”.

For example, this Czech website tells us about vlkodlaci i jiní dlaci ‘werewolves and other were-creatures’ (dlaci is the plural of dlak), and in French the phrase courir le garou ‘run the garou‘ used, at least, to be in circulation, meaning basically ‘go around at night being a werewolf”. That use in turn apparently spawned a verb garouter, meaning much the same thing. The curse lives on.

Optimal Categorisation: How do we categorise the world around us?

Optimal Categorisation: How do we categorise the world around us?

People love to categorise! We do this on a daily basis, consciously and subconsciously. When we are confronted with something new we try and figure out what it is by comparing it to something we already know. Say, for instance, I saw something flying through the air – I may think to myself that the object is a bird, or I may say it is a plane based on my previous experiences of birds and planes. Of course the object may turn out to be something completely new, perhaps even superman!

Is it a bird? Is it a plane? No it’s Superman!

Our love of classification runs deep in scientific enquiry. Botanists and zoologists classify plants and animals into different taxonomies. Even the humble linguist loves to classify – is this new word a noun or a verb? What about the new word zoodle that was recently added to the Merrriam Webster dctionary? Is it a thing? Or an action? Can I zoodle something or is it something I can pick up and touch? Well apparently zoodle is a noun which means ‘a long, thin strip of zucchini that resembles a string or narrow ribbon of pasta’. To be honest, I love eating zoodles, though until now I never knew what they were called!

The way people classify entities around them has become encoded in the different languages we speak in many different ways. The most obvious example that springs to mind is when we learn a new language, like French or German, we are confronted with a grammatical gender system. French has two genders – Masculine and Feminine. But German has three – Masculine, Feminine and Neuter. Other languages can have many more gender distinctions. Fula, a language spoken in west and central Africa, has twenty different gender categories!

So what exactly are grammatical gender systems and how are they realised in different languages? Gender systems categorise nouns into different groups and tend to appear not on the noun itself, but on other elements in the phrase. In German, nouns are split into three different gender categories – masculine, feminine and neuter. The gender of a noun is shown by using different articles (the word ‘the’ or ‘a’) and sometimes by changing the ending of an adjective, but never on the noun itself. Thus the word for ‘the’ in German is either der, die or das depending on whether the noun in the phrase is masculine, feminine or neuter.

(1)        der       Mann
              the       man

(2)        die        Frau
              the       woman

(3)        das       Haus
              the       house

This is called ‘agreement’ as the adjectives and articles must agree with the gender of the noun. In a language with gender, each noun typically can only occur in one gender category.

Not every language has a grammatical gender system, but they are highly pervasive, with around 40% of all languages having such a system. English is quite a poor example when it comes to gender. There is no real gender agreement in English, with the exception of pronouns. We have to say: Bill walked into the grocers. He bought some apples. Where the pronoun he must agree with the gender of the noun that was previously mentioned. English uses he, she and it as the only markers of gender agreement.

Languages behave differently in how they allocate nouns to the different genders, which can be very baffling for language learners! Why in French is chair feminine, la chaise, but in German it is masculine, der Stuhl? How a language allocates nouns to its gender categories can seem somewhat arbitrary – with the exception of the words for women and men, which fall into the feminine and masculine genders being the only semantically obvious choices.

But wait! If you thought the English gender system was dull, think again! A couple of months ago my piano was being restored and when it was being moved back into the lounge the piano movers kept saying: “pull her a little bit more” and “turn her this way”. The movers used the female pronouns to describe the piano. In English, countries, pianos, ships and sometimes even cars use the feminine pronouns.

Grammatical gender isn’t the only way languages classify nouns. Some languages use words called classifiers to categorise nouns. Classifiers are similar to English measure terms, which categorise the noun in terms of its quantity, such as ‘sheet of paper’ vs. ‘pack of paper’ or ‘slice of bread vs. ‘loaf of bread’. Classifiers are found in languages all over the world and are able to categorise nouns depending on the shape, size, quantity or use of the referent, e.g. ‘animal kangaroo’ (alive) vs. ‘meat kangaroo’ (not alive). Classifier systems are very different to gender systems as nouns in a language with classifiers can appear with different classifiers depending on what property of the noun you wish to highlight. There are many different types of classifier systems, but to keep things short I am just going talk about possessive classifiers, which are mainly found in the Oceanic languages, spoken in the South Pacific.

When an item is in your possession we use possessive pronouns in English to say who the item belongs to. For instance if I say ‘my coconut’ – the possessive pronoun is my. In many Oceanic languages a noun can occur with different forms for the word my depending on how the owner intends to use it. For instance the Paamese language, spoken in Vanuatu, has four possessive classifiers and I could use the ‘drinkable’ if I was talking about my coconut that I was going to drink. I would use the ‘edible’ classifier if I was going to eat my coconut. I would use the classifier for ‘land’ if I was talking about the coconut growing in my garden. Finally, I could use the ‘manipulative’ classifier if I was going to use my coconut for some other purpose – perhaps to sit on!

(4)        ani                   mak
              coconut           my.drinkable
              ‘my coconut (that I will drink)’

(5)        ani                   ak
              coconut           my.edible
              ‘my coconut (that I will eat)’

Why do languages have different ways of categorising nouns? How do these systems develop and change over time? Are gender systems easier to learn than classifier systems? Are gender and classifiers completely different systems? Or is there more similarity to them than meets the eye? These are some of the big questions in linguistics and psychology. We are excited to start a new research project at the Surrey Morphology Group, called optimal categorisation: the origin and nature of gender from a psycholinguistic perspective, that seeks to answer these fundamental questions. Over the next three years we will talk more about these fascinating categorisation systems, explain our experimental research methods, introduce the languages and speakers under investigation, and share our findings via this blog. Just look out for the ‘Optimal Categorisation’ headings!

The cat’s mneow: animal noises and human language

The cat’s mneow: animal noises and human language

As is well known, animals on the internet can have very impressive language skills: cats and dogs in particular are famous for their near-complete online mastery of English, and only highly trained professional linguists (including some of us here at SMG) are able to spot the subtle grammatical and orthographic clues that indicate non-human authorship behind some of the world’s favourite motivational statements.

Recent reports suggest that some of our fellow primates have also learnt to engage in complex discourse: again, the internet offers compelling evidence for this.

But sadly, out in the real world, animals capable of orating on philosophy are hard to come by (as far as we can tell). Instead, from a human point of view, cats, dogs, gorillas etc. just make various kinds of animal noises.

Why write about animals and their noises on a linguistics blog? Well, one good answer would be: the exact relationship between the vocalisations made by animals, on one hand, and the phenomenon of human spoken language, on the other, is a fascinating question, of interest within linguistics but far beyond it as well. So a different blog post could have turned now to discuss the semiotic notion of communication in the abstract; or perhaps the biological evolution of language in our species, complete with details about the FOXP2 gene and the descent of the larynx

But in fact I am going to talk about something a lot less technical-sounding. This post is about what could be called the human versions of animal noises: that is, the noises that English and other languages use in order to talk about them, like meow and woof, baa and moo.

At this point you may be wondering whether there is much to be gained by sitting around and pondering words like moo. But what I have in mind here is this kind of thing:

These are good fun, but they also raise a question. If pigs and ducks are wandering around all over the world making pig and duck noises respectively, then how come we humans appear to have such different ideas about what they sound like? Oink cannot really be mistaken for nöff or knor, let alone buu. And the problem is bigger than that: even within a single language, English, frogs can go both croak and ribbit; dogs don’t just go woof, but they also yap and bark. These sound nothing like each other. What is going on? Are we trying to do impressions of animals, only to discover that we are not very good at it?

Before going any further I should deal with a couple of red herrings (to stick with the zoological theme). For one thing, languages may appear to disagree more than they really do, just because their speakers have settled on different spelling conventions: a French coin doesn’t really sound all that different from an English quack. And sometimes we may not all be talking about the same sound in the first place. Ribbit is a good depiction of the noise a frog makes if it happens to belong to a particular species found in Southern California – but thanks to the cultural influence of Hollywood, ribbit is familiar to English speakers worldwide, even though their own local frogs may sound a lot more croaky. Meanwhile, it is easy to picture the difference between the kind of dog that goes woof and the kind that goes yap.

But even when we discount this kind of thing, there are still plenty of disagreements remaining, and they pose a puzzle bound up with linguistics. A fundamental feature of human language, famously pointed out by Saussure, is that most words are arbitrary: they have nothing inherently in common with the things they refer to. For example, there is nothing actually green about the sound of the word green – English has just assigned that particular sound sequence to that meaning, and it’s no surprise to find that other languages haven’t chosen the same sounds to do the same job. But right now we are in the broad realm of onomatopoeia, where you might not expect to find arbitrariness like this. After all, unlike the concept of ‘green’, the concept of ‘quack’ is linked to a real noise that can be heard out there in the world: why would languages bother to disagree about it?

 

First off, it is worth noticing that not all words relating to animal noises work in the same way. Think of cock-a-doodle-doo and crow. Both of these are used in English of the distinctive sound made by a cockerel, and there is something imitative about them both. But there is a difference between them: the first is used to represent the sound itself, whereas the second is the word that English uses to talk about producing it. That is, as English sees it, the way a cock crows is by ‘saying’ cock-a-doodle-doo, and never vice versa. Similarly, the way that a dog barks is by ‘saying’ woof. The representations of the sounds, cock-a-doodle-doo and woof, are practically in quotation marks, as if capturing the animals’ direct speech.

This gives us something to run with. After all, think about the work that words like crow and bark have to do. As they are verbs, you need to be able to change them according to person (they bark but it barks), tense, and so on. So regardless of their special function of talking about noises, they still have to operate like any other verb, obeying the normal grammar rules of English. Since every language comes with its own grammatical requirements and preferences about how words can be structured and manipulated (that is, its own morphology), this can explain some kinds of disparity across languages. For example, what we onomatopoeically call a cuckoo is a kukushka in Russian, featuring a noun-forming element shka which makes the word easier to deal with grammatically – but also makes it sound very Russian. Maybe it is this kind of integration into each language that makes these words sound less true to life and more varied from one language to another?

This is a start, but it must be far from the whole story. Animal ‘quotes’ like woof and cock-a-doodle-doo don’t need to interact all that much with English grammar at all. Nonetheless, they are clearly the English versions of the noises we are talking about:

And as we’ve already seen, the same goes for quack and oink. So even when it looks like we might just be ‘doing impressions’ of non-linguistic sounds, every language has its own way of actually doing those impressions.

Reassuringly, at least we are not dealing with a situation of total chaos. Across languages, duck noises reliably contain an open a sound, while pig noises reliably don’t. And there is widespread agreement when it comes to some animals: cows always go moo, boo or similar, and sheep are always represented as producing something like meh or beh – this is so predictable that it has even been used as evidence for how certain letters were pronounced in Ancient Greek. So languages are not going out of their way to disagree with each other. But this just sharpens up the question. For obvious biological reasons, humans can never really make all the noises that animals can. But given that people the world over sometimes converge on a more or less uniform representation for a given noise, why doesn’t this always happen?

In their feline wisdom, the cats of the Czech Republic can give us a clue. Like sheep, cats sound pretty similar in languages across the globe, and in Europe they are especially consistent. In English, they go meow; in German, it is miau; in Russian, myau; and so on. But in Czech, they go mňau (= approximately mnyau), with a mysterious n-sound inside. The reason is that at some point in the history of Czech, a change in pronunciation affected every word containing a sequence my, so that it came out as mny instead. Effectively, for Czech speakers from then on, the option of saying myau like everyone else was simply off the table, because the language no longer allowed it – no matter what their cats sounded like.

What does this example illustrate? First of all – as well as a morphology, each language has a phonology (sound structure), which constrains its speakers tightly: no language lets people use all the sounds they are physically able to make, and even the available sounds are only allowed to join up in certain combinations. So each language has to come up with a way of dealing with non-linguistic noises which will suit its own idea of what counts as a legitimate syllable. Moo is one thing, but it’s harder to find a language that allows syllables resembling the noise a pig makes… so each language compromises in its own way, resulting in nöff, knor, oink etc., none of which capture the full sonic experience of the real thing.

And second – things like oink, woof and mňau really must be words in the full sense. They aren’t just a kind of quotation, or an imitation performed off the cuff; instead they belong in a speaker’s mental dictionary of their own language. That is why, in general, they have to abide by the same phonological rules as any other word. And that also explains where the arbitrariness comes in: as with any word, language learners just notice that that is the way their own community expresses a shared concept, and from then on there is no point in reinventing the wheel. You don’t need to try hard to get a duck’s quack exactly right in order to talk about it – as long as other people know what you mean, the word has done its job.

So what speakers might lose in accuracy this way, they make up for in efficiency, by picking a predetermined word that they know fellow speakers will recognise. Only when you really want to draw attention to a sound is it worth coming up with a new representation of it and ignoring the existing consensus. To create something truly striking, perhaps you need to be a visionary like James Joyce, who wrote the following line of ‘dialogue’ for a cat in Ulysses, giving short shrift to English phonology in the process:

–Mrkgnao!

 

What’s the good of ‘would of’?

What’s the good of ‘would of’?

As schoolteachers the English-speaking world over know well, the use of of instead of have after modal verbs like would, should and must is a very common feature in the writing of children (and many adults). Some take this an omen of the demise of the English language,  and would perhaps agree with Fowler’s colourful assertion in A Dictionary of Modern English Usage (1926) that “of shares with another word of the same length, as, the evil glory of being accessory to more crimes against grammar than any other” (though admittedly this use of of has been hanging around for a while without doing any apparent harm: this study finds one example as early as 1773, and another almost half a century later in a letter of the poet Keats).

According to the usual explanation, this is nothing more than a spelling mistake. Following ‘would’, ‘could’ etc., the verb have is usually pronounced in a reduced form as [əv], usually spelt would’ve, must’ve, and so on. It can even be reduced further to [ə], as in shoulda, woulda, coulda. This kind of phonetic reduction is a normal part of grammaticalisation, the process by which grammatical markers evolve out of full words. Given the famous unreliability of English spelling, and the fact that these reduced forms of have sound identical to reduced forms of the preposition of (as in a cuppa tea), writers can be forgiven for mistakenly inferring the following rule:

‘what you hear/say as [əv] or [ə], write as of’.

But if it’s just a spelling mistake, this use of ‘of’ is surprisingly common in respectable literature. The examples below (from this blog post documenting the phenomenon) are typical:

‘If I hadn’t of got my tubes tied, it could of been me, say I was ten years younger.’ (Margaret Atwood, The Handmaid’s Tale)

Couldn’t you of – oh, he was ignorant in his speech – couldn’t you of prevented it?’ (Hilary Mantel, Beyond Black)

Clearly neither these authors nor their editors make careless errors. They consciously use ‘of’ instead of ‘have’ in these examples for stylistic effect. This is typically found in dialogue to imply something about the speaker, be it positive (i.e. they’re authentic and unpretentious) or negative (they are illiterate or unsophisticated).

 

These examples look like ‘eye dialect’: the use of nonstandard spellings that correspond to a standard pronunciation, and so seem ‘dialecty’ to the eye but not the ear. This is often seen in news headlines, like the Sun newspaper’s famous proclamation “it’s the Sun wot won it!” announcing the surprise victory of the conservatives in the 1992 general election. But what about sentences like the following from the British National Corpus?

“If we’d of accepted it would of meant we would have to of sold every stick of furniture because the rooms were not large enough”

The BNC is intended as a neutral record of the English language in the late 20th century, containing 100 million words of carefully transcribed and spellchecked text. As such, we expect it to have minimal errors, and there is certainly no reason it should contain eye dialect. As Geoffrey Sampson explains in this article:

“I had taken the of spelling to represent a simple orthographic confusion… I took this to imply that cases like could of should be corrected to could’ve; but two researchers with whom I discussed the issue on separate occasions felt that this was inappropriate – one, with a language-teaching background, protested vigorously that could of should be retained because, for the speakers, the word ‘really is’ of rather than have.”

In other words, some speakers have not just reinterpreted the rules of English spelling, but the rules of English grammar itself. As a result, they understand expressions like should’ve been and must’ve gone as instances of a construction containing the preposition of instead of the verb have:

Modal verb (e.g. must, would…) + of + past participle (e.g. had, been, driven…)

One way of testing this theory is to look at pronunciation. Of can receive a full pronunciation [ɒv] (with the same vowel as in hot) when it occurs at the end of a sentence, for example ‘what are you dreaming of?’. So if the word ‘really is’ of for some speakers, we ought to hear [ɒv] in utterances where of/have appears at the end, such as the sentence below. To my mind’s ear, this pronunciation sounds okay, and I think I even use it sometimes (although intuition isn’t always a reliable guide to your own speech).

I didn’t think I left the door open, but I must of.

The examples below from the Audio BNC, both from the same speaker, are transcribed as of but clearly pronounced as [ə] or [əv]. In the second example, of appears to be at the end of the utterance, where we might expect to hear [ɒv], although the amount of background noise makes it hard to tell for sure.

 “Should of done it last night when it was empty then” (audio) (pronounced [ə], i.e. shoulda)

(phone rings) “Should of.” (audio) (pronounced [əv], i.e. should’ve)

When carefully interpreted, writing can also be a source of clues on how speakers make sense of their language. If writing have as of is just a linguistically meaningless spelling mistake, why do we never see spellings like pint’ve beer or a man’ve his word? (Though we do, occasionally, see sort’ve or kind’ve). This otherwise puzzling asymmetry is explained if the spelling of in should of etc. is supported by a genuine linguistic change, at least for some speakers. Furthermore, have only gets spelt of when it follows a modal verb, but never in sentences like the dogs have been fed, although the pronunciation [əv] is just as acceptable here as in the dogs must have been fed (and in both cases have can be written ‘ve).

If this nonstandard spelling reflects a real linguistic variant (as this paper argues), this is quite a departure from the usual role of a preposition like of, which is typically followed by a noun rather than a verb. The preposition to is a partial exception, because while it is followed by a noun in sentences like we went to the party, it can also be followed by a verb in sentences like we like to party. But with to, the verb must appear in its basic infinitive form (party) rather than the past participle (we must’ve partied too hard), making it a bit different from modal of, if such a thing exists.

She must’ve partied too hard

Whether or not we’re convinced by the modal-of theory, it’s remarkable how often we make idiosyncratic analyses of the language we hear spoken around us. Sometimes these are corrected by exposure to the written language: I remember as a young child having my spelling corrected from storbry to strawberry, which led to a small epiphany for me, as that was the first time I realised the word had anything to do with either straw or berry. But many more examples slip under the radar. When these new analyses lead to permanent changes in spelling or pronunciation we sometimes call them folk etymology, as when the Spanish word cucaracha was misheard by English speakers as containing the words cock and roach, and became cockroach (you can read more about folk etymology in earlier posts by Briana and Matthew).

Meanwhile, if any readers can find clear evidence of modal of with the full pronunciation as  [ɒv], please comment below! I’m quite sure I’ve heard it, but solid evidence has proven surprisingly elusive…

No we [kæn]

No we [kæn]

If something bad happened to someone you hold in contempt, would you give a fig, a shit or a flying f**k? While figs might be a luxury food item in Britain, their historical status as something that is valueless or contemptible puts them on the same level as crap, iotas and rats’ asses for the purposes of caring.

In English, we have a wide range of tools for expressing apathy. But we don’t always agree on how to express it, and even use seemingly opposite affirmative and negative sentences to express very similar concepts.  Consider the confusing distinction between ‘I couldn’t care less’ vs. ‘I could care less’ which are used in identical contexts by British and American speakers of English to mean pretty much the same thing. This mind-boggling pattern makes sense when we realise that those cold-hearted people who couldn’t care less have a care-factor of zero, while the others don’t care much, but could do so even less, if necessary.

Putting aside such oddities, negation is normally crucial to interpreting a sentence – words like ‘not’ determine whether the rest of the sentence is affirmative or negative (i.e. whether you’re claiming it is true or false). Accordingly, languages tend to mark negation clearly, sometimes in more than once place within a sentence. One of the world’s most robust languages in this respect is Bierebo, an Austronesian language spoken in Vanuatu, where no less than three words for expressing negation are required at once (Budd 2010: 518):

Mara   a-sa-yal              re         manu  dupwa  pwel.
NEGl   3PL.S-eat-find   NEG2  bird     ANA      NEG3
‘They didn’t get to eat the bird.’

While marking negation three times might seem a little inefficient, this pales in comparison to the problems that arise when you don’t clearly indicate it all. We only have to turn to English to see this at work, where the distinction between Received Pronunciation can [kæn] and can’t [kɑ:nt] is frequently imperceptible in American varieties where final /t/ is not released, resulting in [kæn] or [kən] in both affirmative and negative contexts.

You might think that once a word or affix or sound that indicates negation has been removed from a word, there isn’t anywhere else to go. But some Dravidian languages spoken in India really push the boat out in this respect. Instead of adding some sort of negative word or affix to an affirmative sentence to signal negation, the tense affix (past –tt or future -pp) is taken away, as shown by the contrast between literary Tamil affirmatives and negatives.

pati-tt-ēn                    pati-pp-ēn                  patiy-ēn
‘I learned’                  ‘I will learn.’               ‘I do/did/will not learn.’

This is highly unusual from a linguistic point of view, and it’s tempting to think that languages avoid this type of negation because it is difficult to learn or doesn’t make sense design-wise. But historical records show similar patterns have been attested across Dravidian languages for centuries. This demonstrates that inflection patterns of this kind can be highly sustainable when they come about – so we might be stuck with the can/can’t collapse for a while to come.

On prodigal loanwords

On prodigal loanwords

Most people at some point in their life will have heard someone remark on how their language X (where X is any language) is getting corrupted by other languages and generally “losing its X-ness”. Today I would like to focus on one aspect of the so-called corruption of languages by other languages — lexical borrowings – and show that it’s perhaps not that bad.

European French (at least the French advertised by the Académie Française) is certainly a language about which its speakers worry, so much so that there is even an institution in charge of deciding what is French and what is not (see Helen’s earlier post). A number of English-looking/sounding words now commonly used in spoken French have indeed been taken from English, but English first took them from French!

For instance, the word flirter ‘to court someone’ is obviously adapted from English to flirt and it has the same meaning in both languages. But the English word is the adaptation of the French word fleurette in the expression conter fleurette! The expression conter fleurette is no longer used (casually) in spoken French.

“How could the universe live without your beauty?” “I wonder how sincere he is…”

Other examples of English words borrowed from (parts of) French expressions which then get adapted into French are in (2).

Thus un rosbif is an adaptation into French of roast beef which is itself an adaptation into English of the passive participle of the verb rostir “roast” which later became rôtir in Modern French, and buef “ox/beef” which later became boeuf in the Modern French.

The word un toast comes from English toast with the meaning “piece of toasted bread”. The English word itself was borrowed from tostée, an Old French noun derived from the verb toster which is not used in Modern French. The word pédigré comes from English pedigree but this word is itself adapted from French pied de grue “crane foot”, describing the shape of junctions in genealogical trees.

Pied de grue ‘Crane foot’

Finally, the verb distancer is transitive in Modern French, which means that it requires a direct object: thus the sentence in (a) is good because the verb distancer “distance” has a direct object, the phrase la voiture blanche  “the white car”. By contrast, the construction in (b) is not acceptable (signified by the * symbol) because it lacks an object.

a. La voiture rouge a distancé la voiture blanche.
‘The red car distanced the white car.’
b. *La voiture rouge a distancé.

The (transitive) Modern French verb distancer comes from English to distance which itself is a borrowing from the no-longer-used Old French verb distancer which was uniquely intransitive with the meaning “be far” (that is, in Old French, distancer could only be used in a construction with no direct object).

Another instance is (3): the word tonnelle ‘bower, arbor’ was borrowed into English and became tunnel under the influence of the local pronunciation. The word tunnel was then borrowed by French to refer exclusively to …. wait for it … tunnels. Both words now subsist in French with different meanings.

Une tonnelle ‘a bower’, Un tunnel ‘a tunnel’

Other examples of words that were borrowed into English and ‘came back’ into French with a different meaning are in (4).

The ancestor of tennis is the jeu de paume during which players would say tenez “there you go” as they were about to serve (at that time the final “z” was pronounced [z], it is not in Modern French). This word was adapted into English and became tennis which was then borrowed back into French to refer to the sport jeu de paume evolved into.

Jeu de paume vs. tennis

The Middle French word magasin used to refer to a warehouse, a collection of things. This word was borrowed into English and came to refer to a collection of things on paper. The word magazine was then borrowed back into French with this new meaning.

The history of the word budget also interesting. The word bouge used to mean “bag” and a small bag was therefore bougette (the -ette suffix is used as a diminutive, e.g. fourche “pitchfork” – fourchette “fork”). The word was borrowed into English where its pronunciation was “nativized” and it came to refer to a small bag of money. It was then borrowed back into French with the new meaning of “allocated sum of money”. Finally, ticket was borrowed from English which borrowed it from French estiquet, which referred to a piece of paper where someone’s name was written.

This happens in other languages of course. For instance, Turkish took the word pistakion ‘pistachio’ from (Ancient) Greek which became fistik. (Modern) Greek then borrowed this word back from Turkish which was then spelled phistiki with the meaning ‘pistachio’.

The main lesson I draw from the existence of ‘prodigal loanwords’ is that one’s impressions of language corruption often lack the perspective to actually ground that impression in reality. A French speaker looking at flirter ‘flirt’ may think that this is another sign of the influence of English — and they would be right — without being aware that this is after all a French word fleurette just coming back home.

Do you know other examples of prodigal loanwords? Please, share by commenting on this post!

Sources:
L’aventure des langues en Occident, Henriette Walter
Honni soit qui mal y pense, Henriette Walter
Jérôme Serme. 1998. Un exemple de résistance à l’innovation lexicale: les “archaïsmes” du français régional, Thèse Lyon II
Javier Herráez Pindado. 2009. Les emprunts aller-retour entre le français et l’anglais dans le sport. Universidad Politécnica de Madrid.

Reindeer = rein + deer?

Reindeer = rein + deer?

In linguists’ jargon, a ‘folk etymology’ refers to a change that brings a word’s form closer to some easily analyzable meaning. A textbook example is the transformation of the word asparagus into sparrowgrass in certain dialects of English.

Although clear in theory, it is not easy to decide whether ‘folk etymology’ is called for in other cases. One which has incited heated coffee-time discussion in our department is the word reindeer. The word comes ultimately from Old Norse hreindyri, composed of hreinn ‘reindeer’ and dyri ‘animal’. In present-day English, some native speakers conceive of the word reindeer as composed of two meaningful parts: rein + deer. This is something which, in the Christian tradition at least, does make a lot of sense. Given that the most prominent role of reindeer in the West is to serve as Santa’s means of transport, an allusion to ‘reins’ is unsurprising. This makes the hypothesis of folk etymology plausible.

When one explores the issue further, however, things are not that clear. The equivalent words in other Germanic languages are often the same (e.g. German Rentier, Dutch rendier, Danish rensdyr etc.) even though the element ren does not refer to the same thing as in English. However, unlike in English, another way of referring to Rudolf is indeed possible in some of these languages that omits the element ‘deer’ altogether: German Ren, Swedish ren, Icelandic hreinn, etc.

Another thing that may be relevant is the fact that the word ‘deer’ has narrowed its meaning in English to refer just to a member of the Cervidae family and not to any living creature. Other Germanic languages have preserved the original meaning ‘animal’ for this word (e.g. German Tier, Swedish djur).

Since reindeer straightforwardly descends from hreindyri, it may seem that, despite the change in the meaning of the component words, we have no reason to believe that the word was altered by folk etymology at any point. However, the story is not that simple. Words that contained the diphthong /ei/ in Old Norse do not always appear with the same vowel in English. Contrast, for example, ‘bait’ [from Norse beita] and ‘hail’ [from heill] with ‘bleak’ [from bleikr] and ‘weak’ [from veikr]). An orthographic reflection of the same fluctuation can be seen in the different pronunciation of the digraph ‘ei’ in words like ‘receive’ and ‘Keith’ vs ‘vein’ and weight’. It is, thus, not impossible that the preexistence of the word rein in (Middle) English tipped the balance towards the current pronunciation of reindeer over an alternative one like “reendeer”. Also, had the word not been analyzed by native speakers as a compound of rein+deer, it is not unthinkable that the vowels may have become shorter in current English (consider the case of breakfast, etymologically descending from break + fast).

So, is folk etymology applicable to reindeer? The dispute rages on. Some of us don’t think that folk etymology is necessary to explain the fate of reindeer. That is, the easiest explanation (in William of Occam’s sense) may be to say that the word was borrowed and merely continued its overall meaning and pronunciation in an unrevolutionary way.

Others are not so sure. The availability of “fake” etymologies like rein+deer (or even rain+deer before widespread literacy) seems “too obvious” for native speakers to ignore. The suspicion of ‘folk etymology’ might be aroused by the presence of a few mild coincidences such as the “right” vowel /ei/ instead of /i:/, the fact that the term was borrowed as reindeer rather than just rein as in some other languages [e.g. Spanish reno] or by the semantic drift of deer exactly towards the kind of animal that a reindeer actually is. These are all factors that seem to conspire towards the analyzability of the word in present-day English but which would have to be put down to coincidence if they just happened for no particular reason and independently of each other. Even if no actual change had been implemented in the pronunciation of reindeer, the morphological-semantic analysis of the word has definitely changed from its source language. Under a laxer definition of what folk etymology actually is, that could on its own suffice to label this a case of folk etymology.

There seems to be, as far as we can see, no easy way out of this murky etymological and philological quagmire that allows us to conclude whether a change in the pronunciation of reindeer happened at some point due to its analyzability. To avoid endless and unproductive discussion one sometimes has to know when to stop arguing, shrug and write a post about the whole thing.