Browsed by
Category: Analogy

Remember, remember

Remember, remember

A lot of the work that linguists do involves taking a language as it is spoken at a particular time, finding generalizations about how it operates, and coming up with abstractions to make sense of them. In English, for example, we identify a category of ‘number’ (with possible values ‘singular’ and ‘plural’); and we do that because in many ways the relationship between cat and cats is the same as that between mouse and mice, man and men, and so on, meaning that it would be useful to treat all of these pairings as specific examples of a more general phenomenon. We can then make the further generalization that whatever this linguistic concept of ‘number’ really is, it is not only relevant to nouns but also to verbs, and to some other items too – because English speakers all know that this cat scratches whereas these cats scratch, and you can’t have any other combination like *these cat scratch.

A black cat wearing bat wings for Halloween
This bat scratches

Once you start looking, you discover layer upon layer of generalizations like these, and you need more and more abstractions in order to take care of them all. This all gives rise to a view of language as a kind of machine built out of abstract principles, all coexisting at the same time inside a speaker’s head. On that basis, we can ask questions like: are there any principles that all languages use? Does having pattern X always go along with having pattern Y? Are there any generalizations that you can easily come up with, but that turn out not to be found anywhere? What does all this tell us about human psychology?

But that is not the only approach to language we could take. While we can point to a general principle of English to explain what is wrong with these cat, there is no similar principle explaining why we refer to the meowing, purring, scratching creature as a cat in the first place. The word cat has nothing feline about it, and the fact that we use that sequence of sounds – rather than e.g. tac – is not based on some higher-level truth that applies for all English speakers right now: instead, the ‘explanation’ is rooted in the fact that this is the word we happened to inherit from earlier generations of speakers.

Portrait photo of General Burnside, featuring his famous sideburns
General Ambrose Burnside (1824-1881)

So studying the etymology of individual words serves as a good reminder that as well as an abstract, principled system residing in human minds, every language is also a contingent historical artefact, shaped by the peoples and cultures of the past.1 Nothing makes this more obvious than the continued existence of ordinary vocabulary items that commemorate individuals from centuries gone by – often without modern-day speakers even knowing it. In English, sandwiches are named after the Earl of Sandwich, wellingtons are named after the Duke of Wellington, and cardigans are named after the Earl of Cardigan; and the parallelism here says something about the locus of cultural influence in Georgian and Victorian Britain. More cryptically, sideburns owe their name to a General Burnside of the US Army, justly famed for his facial hair; algorithms celebrate the Persian mathematician al-Khwarizmi; and Duns Scotus, although a towering figure of medieval philosophy, now lives on in the word dunce popularized by his academic opponents.2

But which historical figure has had the greatest success of all in getting his name woven into the fabric of modern English? I reckon that, against all the odds, it could well be this Guy.

A close up of the face of Guy Fawkes, labelled Guido Fawkes, from a depiction of several conspirators together

While all English speakers are familiar with the word guy as an informal word corresponding to man, probably not that many know that it can be traced back to a historical figure from 400 years ago who, in a modern context, would be called a religious terrorist. Guy Fawkes was one of the conspirators in the ‘Gunpowder Plot’ of November 1605: with the aim of installing a Catholic monarchy, they planned to assassinate England’s Protestant king, James I, by blowing up Parliament with him inside. Fawkes was not one of the leaders of the conspiracy, but he was the one caught red-handed with the gunpowder; as a result, one cultural legacy of the plot’s failure is the celebration every 5th November (principally in the UK) of Guy Fawkes Night, which commonly involves letting off fireworks and setting a bonfire on which a crude effigy of Fawkes was traditionally burnt.

But how did the name of one specific Guy, for a while the most detested man in the English-speaking world, end up becoming a ubiquitous informal term applying to any man? The crucial factor is the effigy. It is unsurprising that this came to be called a Guy, ‘in honour’ of the man himself; but by the 19th century, the word was also being used to refer to actual men who dressed badly enough to earn the same label, in the way one might jokingly liken someone to a scarecrow (one British woman writing home from Madras in 1836 commented: ‘The gentlemen are all ‘rigged Tropical’,… grisly Guys some of them turn out!’). It is not a big step from there to using guy as a humorous and, eventually, just a colloquial word for men in general.3

Procession of a Guy (1864)

And of course the story does not stop there. While a guy is still almost always a man, for many speakers the plural guys can now refer to people in general, especially as a term of address. The idea that a word with such unambiguously masculine origins could ever be treated as gender-neutral has been something of a talking point in recent years, as in this article from The Atlantic about the rights and wrongs of greeting women with a friendly ‘hey guys’; but the fact that it is debated at all shows that it is happening. In fact, there is good reason to think that in some varieties of English, you-guys is being adopted as a plural form of the personal pronoun you: one piece of evidence is the existence of special possessive forms like your-guys’s, a distinctively plural version of your.

It is interesting to notice that the rise of non-standard you-guys, not unlike y’all and youse, goes some way towards ‘fixing’ an anomaly within modern English as a system: almost all nouns, and all other personal pronouns, have distinct singular and plural forms, whereas the standard language currently has the same form you doing double duty as both singular and plural. Any one of these plural versions of you might eventually win out, further strengthening the (already pretty reliable) generalization that English singulars and plurals are formally distinct. This just goes to show that the two ways of looking at language – as a synchronic system, and as a historical object – need to complement each other if we really want to understand what is going on. At the same time, it is fun to think of linguists of the distant future researching the poorly attested Ancient English language of the twenty-second century, and wondering where the mysterious personal pronoun yugaiz came from. Would anyone who didn’t know the facts dare to suggest that the second syllable of this gender-neutral plural pronoun came from the given name of a singular male criminal, executed many centuries before?

  1. For example, cat itself seems to be traceable back to an ancient language of North Africa, reflecting the fact that cats were household animals among the Egyptians for millennia before they became popular mousers in Europe. []
  2. Of course, it is no accident that all of these examples feature men. Relatively few women in history have had the opportunity to turn into items of English vocabulary; in fact, fictional female characters – largely from classical mythology – have had much greater success, giving us e.g. calypso, rhea and Europe. []
  3. A similar thing also happened to the word joker in the 19th century, though it didn’t get as far as guy: that suggests that sentences containing guy would once have had the same ring to them as Who’s this joker?; and then some joker turns up and says… []
Is twote the past of tweet?

Is twote the past of tweet?

Have you ever encountered the form twote as a past tense of the verb to tweet? It is something of a meme on Twitter, and a live example of analogy (and its mysteries). However surprising the form may sound if you have never encountered it, it has been the prescribed one for a long time:

Ten years later, the question popped up among a linguisty Twitter crowd, where a poll again elected twote as the correct form:

It is clear that this unusual form replacing tweeted is some sort of form, but why specifically twote? I saw here and there a reference to the verb to yeet, a slang verb very popular on the internet and meaning more or less “to throw”. Rather than a regular form yeeted, the past for to yeet is often taken to be yote. The choice of an irregular form is probably meant to produce a comedic effect.

This, precisely, is analogical production: creating a new form (twote) by extending a contrast seen in other words (yeet/yote). Analogy is a central topic in my research. I have been trying to answer questions such as: How do we decide what form to use ? How difficult is it to guess? How does this contribute to language change?

But first, have you answered the poll?

What is the past tense of “to tweet”?

To investigate further why we would say twote rather than tweeted, I took out my PhD software (Qumin). Based on 6064 examples of English verbs1, I asked Qumin to produce and rank possible past forms of tweet2. To do so, it read through examples to construct analogical rules (I call them patterns), then evaluated the probability of each rule among the words which sound like tweet.

Qumin found four options3: tweeted (/twiːtɪd/), by analogy with 32 similar words, such as greet/greeted; twet (/twɛt/), by analogy with words like meet/met; tweet (/twiːt/) by analogy with words like beat/beat, finally twote (/twəˑʊt/), by analogy with yeet. Figure 1 provides their ranking (in ascending order) according to Qumin, with the associated probabilities.

Twote 0.028 < tweet 0.056 < twet 0.056 < tweeted 0.86
Figure 1. Qumin’s ranking of the probability for potential past forms of to tweet

As we can see, Qumin finds twote to be the least likely solution. This is a reasonable position overall (indeed, tweeted is the regular form), so why would both the official Twitter account and many Twitter users (including several linguists) prefer twote to tweeted?

But Qumin has no idea what is cool, a factor which makes yeet/yote (already a slang word, used on the internet) a particularly appealing choice. Moreover, Qumin has no access to semantic similarity, which could also play a role. Verbs that have similar meanings can be preferred as support for the analogy. In the current case, both speak/spoke and write/wrote have similar pasts to twote, which might help make it sound acceptable. Some speakers seem to be aware of these factors, as seen in the tweet above.

What about usage?

Are most speakers aware of the variant twote and using it? Before concluding that the model is mistaken, we need to observe what speakers actually use. Indeed, only usage truly determines “what is the past of tweet”. For this, I turn to (automatically) sifting through Twitter data.

Speakers must choose between tweeted or twote: what a dilemna !

A few problems: first, the form “tweet” is also a noun, and identical to the present tense of the verb. Second, “twet” is attested (sometimes as “twett”), but mostly as a synonym for the noun “tweet” (often in a playful “lolcat” style), or as a present verbal form, with a few exceptions, usually of a meta nature (see tweets below). I couldn’t find a way to automatically distinguish these from past forms while also managing within the Twitter API limits. Thus, I left out both from the search entirely. This leaves only our two main contestants.

 

I extracted as many recent tweets containing tweeted or twote as Twitter would let me — around 300 000 tweets twotten between the 26th of August and the 3rd of September. 186777 tweets remained after refining the search4. Of these, less than 5000 contain twote:

There were more than 180000 occurences of tweeted and less than 5000 of twote in the past few days.
Counts of tweets containing either of two possible pasts for the verb “to tweet” in the past few days on twitter (mentions excluded).

As you can see, the tweeted bar completely dwarfs the other one. However amusing and fitting twote may be, and despite @Twitter’s prescription (but conforming with Qumin’s prediction), the regular past form is by far the most used, even on the platform itself, which lends itself to playful and impactful statements. This easily closes this particular English Past Tense Debate. If only it were always this simple!

  1. The English verb data I used includes only the present and past tenses, and is derived from the CELEX 2 dataset, as used in my PhD dissertation and manually supplemented by the forms for “yeet”. The CELEX2 dataset is commercial, and I can not distribute it. []
  2. The code I used for this blog post is available here, but not the dataset itself. Note that for scientific reasons I won’t discuss here, this software works on sounds, not orthography. []
  3. One last possibility has been ignored by this polite software, a form which follows the pattern of sit/sat. I see it used from time to time for its comic effect, but it does not seem at all frequent enough to be a real contestant (and I do not recommend searching this keyword on Twitter). []
  4. Since there has been a lot of discussion on the correct form, I exclude all clear cases of mentions. I count as mentions any occurrences wrapped in quotations, co-occurring with alternate forms, mentioning past tense, or with a hashtag. Moreover, with the forms in –ed, it is likely that the past participle would be identical, but for twote, the past participle could well be twotten. To reduce the bias due to the presence of more past participles in the usage of tweeted, I also exclude all contexts where the word is preceded by the auxiliary forms has, have, had, is, are, was, were, possibly separated by an adverb. []
Eggcorns and mondegreens: a feast of misunderstandings

Eggcorns and mondegreens: a feast of misunderstandings

Have you ever felt that you needed to nip something in the butt, or had the misfortune to witness a damp squid? And what can Jimi Hendrix, Bon Jovi and Freddie Mercury tell us about language change?

Well, if you know Hendrix’s classic “Purple Haze”, you surely remember the moment where he interrupts his train of thought with the unexpected request, ‘Scuse me while I kiss this guy. Or perhaps you recall “Living on a Prayer”, where we hear that apparently It doesn’t make a difference if we’re naked or not. And who can forget the revelation, in “Bohemian Rhapsody”, that Beelzebub has a devil for a sideboard?

Wise words from Celine Dion

If you do remember these lyrics fondly, you are not alone – lots of people are familiar with these exact lines. There is just one problem, of course: none of those songs really say those things. Instead, the lyrics involved are ‘Scuse me while I kiss the sky; It doesn’t make a difference if we make it or not; and Beelzebub has a devil put aside for me. And yet thousands of English speakers the world over have had the experience of listening to “Purple Haze” and the others – and of misunderstanding the words, entirely independently, in exactly the same way.

Mishearings of this kind are common enough that they have been given a name of their own, mondegreens – a word invented by the American writer Sylvia Wright, who as a child heard a poem containing the following lines:

For they hae slain the Earl o’ Moray
And laid him on the green

and assumed that it listed not one but two victims – the unfortunate Earl himself, and “Lady Mondegreen”, a plausible character who happens not to feature in the real poem.

Why does this kind of thing happen? One reason has to do with the nature of spoken language. On the page, English sentences come pre-packaged into words, each of which is made up of distinct, easily-identified letters which look pretty much the same every time. But pronounced out loud, they are not like that! Instead, a continuous, mushy stream of noise makes its way into our ears, and it is up to our brains to work out what speech sounds are actually in there, where one word ends and the next one begins (think the-sky versus this-guy), and so on. Obviously this process is not exactly helped when there are rock guitars competing for your attention too.

Obama’s elf….. don’t wanna be… Obama’s elf… any more…

But another reason is that we are never ‘just listening’ passively. Instead, behind the scenes, our minds are busy trying to relate what we’re hearing to our existing knowledge – not only our linguistic knowledge, but our general knowledge about the world. For example, the common-sense knowledge that people tend to kiss other people, rather than intangible abstractions like the sky. This is obviously very useful most of the time, but in the “Purple Haze” case it leads us astray, because the more implausible meaning is the one that Jimi Hendrix intended.

What has this all got to do with language change? Well, the crucial point is that what I’ve just said – interpreting sounds is complicated, and to navigate the process we engage our common sense as well as our knowledge of the language – applies just as well to normal conversation as it does to song lyrics. We don’t always hear things perfectly, and even if we do, we have to square the things we’ve just heard with the things we already knew, which provide a guide for our interpretation but may sometimes take us in the wrong direction.

So if you hear someone referring to a really disappointing experience as a damp squib, but are not familiar with squib (an old-fashioned word for a firework), what is to stop you thinking that what you really heard was damp squid? A squid is, after all, a very damp creature, and not always something that people are hugely fond of. Similarly, the expression to nip in the bud makes sense if you latch on to the gardening metaphor it is based on – but if you don’t, well, nipping an undesirable thing in the butt does sound like a very effective way of getting rid of it. So, people who think the expressions really are damp squid and nip in the butt have made a mistake along the lines of “kiss this guy”; the difference is that here they may end up using the new versions in their own speech, and thus pass them on to other speakers. And the process doesn’t have to involve whole expressions: individual words are susceptible to it too, for example midriff becoming mid-rift or utmost becoming up-most.

It’s beautiful, but undeniably damp

Misinterpreted words and expressions like these, which have some kind of new internal logic of their own, are known as eggcorns. This is because egg-corn is exactly how some English speakers have reinterpreted the word acorn, on the basis that acorns are indeed egg-shaped seeds. And the development of a new eggcorn may not involve any mishearing at all, just reinterpretation of one word as another one that sounds exactly the same. Are you expected to toe the line or to tow the line? Are people given free rein or free reign? In each case the two expressions sound identical, and each brings with it some kind of coherent mental image. For the moment, toe the line and free rein are still considered to be the ‘correct’ versions of these idioms, but perhaps in the future that will no longer be the case.

As words and expressions are reinterpreted over time, the language changes little by little: in speech and in writing, people pass on their reinterpretations to one another, in a way which may eventually pass right through the language. The underlying factors producing eggcorns are the same as those producing mondegreens. But unlike the lyrics of “Purple Haze”, words and idioms don’t generally have a fixed author and don’t belong to anybody, meaning that if everyone started calling acorns eggcorns, then that just would be the correct word for them: the previous, now meaningless term acorn would be no more than a historical curiosity, and English as a whole would be very slightly different from how it is now.

So this is how we get from Jimi Hendrix to language change – via mondegreens and eggcorns. Have you spotted any eggcorns in the wild? And how likely do you think they are to catch on and become the new normal?

Werewolves

Werewolves

Hallowe’en will soon be upon us, so it is only right we turn our attention to monsters. Consider the werewolf. It’s a wolf, sort of, as the name indicates, but what’s a were? The usual assumption is that it’s a leftover of an older word meaning ‘man’ that fell completely out of fashion by the 14th century. As a result we have what looks like a compound word, except that one of the parts doesn’t have any meaning on its own. Perhaps not, but that hasn’t stopped people from squeezing some value out of it nonetheless: if a werewolf is a person who turns into a wolf — or at any rate, part person, part wolf — then a were-bear is a mixture of person and bear, and so on down to were-turtles.

Actually, people don’t seem to be that literal-minded when it comes to word meanings, if the various were-creatures in circulation are any evidence. The monster from “Wallace and Gromit: Curse of the Were-Rabbit” is not half-human, half-rabbit, but more just kind of a monster rabbit, with a thicker pelt. (Visually calqued, I suspect, from the not-particularly wolf-like wolfman of the wolfman movies featuring Lon Chaney Jr.)

And were-fleas, to the extent that they exist, appear to be carriers of lycanthropism rather than human/insect conglomerates. None of this is yet reflected in the Oxford English Dictionary’s entry on were– (you need a subscription for that but it’s free if you have a UK public library card!). Give it a few decades more maybe.

Strangely, words for werewolf in other languages share a propensity for being compounds made up of ‘wolf’ plus some other completely opaque element. The first part of Czech vlkodlak is vlk, which means ‘wolf‘, but dlak on its own is not an independent word. (Not in Czech at any rate, but in the related language Slovenian the equivalent word volkodlak is clearly made up of volk ‘wolf’ and dlaka, which means ‘hair’ or ‘fur’.) And the French werewolf, loup-garou, has the word for ‘wolf’ in it (loup), but garou is not an independent word (other than being an unrelated homonym meaning ‘flax-leaved daphne’). That part seems to have been our very own Germanic word werewolf borrowed at an early date (earliest attestation as garwall from the 12th century). Both of these have, like werewolf, given rise to further monstrous hybrids like Czech prasodlak, from prase ‘pig’, or the French cochon-garou.

In fact, Czech and French have gone one step further than English. Though I just wrote that dlak and garou were not words, that was being a bit pedantic. Neither of them are listed in the authoritative Academy dictionaries of Czech and French, but nonetheless they do seem to have split off from their host body, rather as happened — if we can be permitted to mix monster metaphors — to the hero of 1959’s “The Manster (a.k.a The Split)”.

For example, this Czech website tells us about vlkodlaci i jiní dlaci ‘werewolves and other were-creatures’ (dlaci is the plural of dlak), and in French the phrase courir le garou ‘run the garou‘ used, at least, to be in circulation, meaning basically ‘go around at night being a werewolf”. That use in turn apparently spawned a verb garouter, meaning much the same thing. The curse lives on.