Browsed by
Category: Analogy

Is twote the past of tweet?

Is twote the past of tweet?

Have you ever encountered the form twote as a past tense of the verb to tweet? It is something of a meme on Twitter, and a live example of analogy (and its mysteries). However surprising the form may sound if you have never encountered it, it has been the prescribed one for a long time:

Ten years later, the question popped up among a linguisty Twitter crowd, where a poll again elected twote as the correct form:

It is clear that this unusual form replacing tweeted is some sort of form, but why specifically twote? I saw here and there a reference to the verb to yeet, a slang verb very popular on the internet and meaning more or less “to throw”. Rather than a regular form yeeted, the past for to yeet is often taken to be yote. The choice of an irregular form is probably meant to produce a comedic effect.

This, precisely, is analogical production: creating a new form (twote) by extending a contrast seen in other words (yeet/yote). Analogy is a central topic in my research. I have been trying to answer questions such as: How do we decide what form to use ? How difficult is it to guess? How does this contribute to language change?

But first, have you answered the poll?

What is the past tense of “to tweet”?

To investigate further why we would say twote rather than tweeted, I took out my PhD software (Qumin). Based on 6064 examples of English verbs1, I asked Qumin to produce and rank possible past forms of tweet2. To do so, it read through examples to construct analogical rules (I call them patterns), then evaluated the probability of each rule among the words which sound like tweet.

Qumin found four options3: tweeted (/twiːtɪd/), by analogy with 32 similar words, such as greet/greeted; twet (/twɛt/), by analogy with words like meet/met; tweet (/twiːt/) by analogy with words like beat/beat, finally twote (/twəˑʊt/), by analogy with yeet. Figure 1 provides their ranking (in ascending order) according to Qumin, with the associated probabilities.

Twote 0.028 < tweet 0.056 < twet 0.056 < tweeted 0.86
Figure 1. Qumin’s ranking of the probability for potential past forms of to tweet

As we can see, Qumin finds twote to be the least likely solution. This is a reasonable position overall (indeed, tweeted is the regular form), so why would both the official Twitter account and many Twitter users (including several linguists) prefer twote to tweeted?

But Qumin has no idea what is cool, a factor which makes yeet/yote (already a slang word, used on the internet) a particularly appealing choice. Moreover, Qumin has no access to semantic similarity, which could also play a role. Verbs that have similar meanings can be preferred as support for the analogy. In the current case, both speak/spoke and write/wrote have similar pasts to twote, which might help make it sound acceptable. Some speakers seem to be aware of these factors, as seen in the tweet above.

What about usage?

Are most speakers aware of the variant twote and using it? Before concluding that the model is mistaken, we need to observe what speakers actually use. Indeed, only usage truly determines “what is the past of tweet”. For this, I turn to (automatically) sifting through Twitter data.

Speakers must choose between tweeted or twote: what a dilemna !

A few problems: first, the form “tweet” is also a noun, and identical to the present tense of the verb. Second, “twet” is attested (sometimes as “twett”), but mostly as a synonym for the noun “tweet” (often in a playful “lolcat” style), or as a present verbal form, with a few exceptions, usually of a meta nature (see tweets below). I couldn’t find a way to automatically distinguish these from past forms while also managing within the Twitter API limits. Thus, I left out both from the search entirely. This leaves only our two main contestants.

 

I extracted as many recent tweets containing tweeted or twote as Twitter would let me — around 300 000 tweets twotten between the 26th of August and the 3rd of September. 186777 tweets remained after refining the search4. Of these, less than 5000 contain twote:

There were more than 180000 occurences of tweeted and less than 5000 of twote in the past few days.
Counts of tweets containing either of two possible pasts for the verb “to tweet” in the past few days on twitter (mentions excluded).

As you can see, the tweeted bar completely dwarfs the other one. However amusing and fitting twote may be, and despite @Twitter’s prescription (but conforming with Qumin’s prediction), the regular past form is by far the most used, even on the platform itself, which lends itself to playful and impactful statements. This easily closes this particular English Past Tense Debate. If only it were always this simple!

  1. The English verb data I used includes only the present and past tenses, and is derived from the CELEX 2 dataset, as used in my PhD dissertation and manually supplemented by the forms for “yeet”. The CELEX2 dataset is commercial, and I can not distribute it. []
  2. The code I used for this blog post is available here, but not the dataset itself. Note that for scientific reasons I won’t discuss here, this software works on sounds, not orthography. []
  3. One last possibility has been ignored by this polite software, a form which follows the pattern of sit/sat. I see it used from time to time for its comic effect, but it does not seem at all frequent enough to be a real contestant (and I do not recommend searching this keyword on Twitter). []
  4. Since there has been a lot of discussion on the correct form, I exclude all clear cases of mentions. I count as mentions any occurrences wrapped in quotations, co-occurring with alternate forms, mentioning past tense, or with a hashtag. Moreover, with the forms in –ed, it is likely that the past participle would be identical, but for twote, the past participle could well be twotten. To reduce the bias due to the presence of more past participles in the usage of tweeted, I also exclude all contexts where the word is preceded by the auxiliary forms has, have, had, is, are, was, were, possibly separated by an adverb. []
Eggcorns and mondegreens: a feast of misunderstandings

Eggcorns and mondegreens: a feast of misunderstandings

Have you ever felt that you needed to nip something in the butt, or had the misfortune to witness a damp squid? And what can Jimi Hendrix, Bon Jovi and Freddie Mercury tell us about language change?

Well, if you know Hendrix’s classic “Purple Haze”, you surely remember the moment where he interrupts his train of thought with the unexpected request, ‘Scuse me while I kiss this guy. Or perhaps you recall “Living on a Prayer”, where we hear that apparently It doesn’t make a difference if we’re naked or not. And who can forget the revelation, in “Bohemian Rhapsody”, that Beelzebub has a devil for a sideboard?

Wise words from Celine Dion

If you do remember these lyrics fondly, you are not alone – lots of people are familiar with these exact lines. There is just one problem, of course: none of those songs really say those things. Instead, the lyrics involved are ‘Scuse me while I kiss the sky; It doesn’t make a difference if we make it or not; and Beelzebub has a devil put aside for me. And yet thousands of English speakers the world over have had the experience of listening to “Purple Haze” and the others – and of misunderstanding the words, entirely independently, in exactly the same way.

Mishearings of this kind are common enough that they have been given a name of their own, mondegreens – a word invented by the American writer Sylvia Wright, who as a child heard a poem containing the following lines:

For they hae slain the Earl o’ Moray
And laid him on the green

and assumed that it listed not one but two victims – the unfortunate Earl himself, and “Lady Mondegreen”, a plausible character who happens not to feature in the real poem.

Why does this kind of thing happen? One reason has to do with the nature of spoken language. On the page, English sentences come pre-packaged into words, each of which is made up of distinct, easily-identified letters which look pretty much the same every time. But pronounced out loud, they are not like that! Instead, a continuous, mushy stream of noise makes its way into our ears, and it is up to our brains to work out what speech sounds are actually in there, where one word ends and the next one begins (think the-sky versus this-guy), and so on. Obviously this process is not exactly helped when there are rock guitars competing for your attention too.

Obama’s elf….. don’t wanna be… Obama’s elf… any more…

But another reason is that we are never ‘just listening’ passively. Instead, behind the scenes, our minds are busy trying to relate what we’re hearing to our existing knowledge – not only our linguistic knowledge, but our general knowledge about the world. For example, the common-sense knowledge that people tend to kiss other people, rather than intangible abstractions like the sky. This is obviously very useful most of the time, but in the “Purple Haze” case it leads us astray, because the more implausible meaning is the one that Jimi Hendrix intended.

What has this all got to do with language change? Well, the crucial point is that what I’ve just said – interpreting sounds is complicated, and to navigate the process we engage our common sense as well as our knowledge of the language – applies just as well to normal conversation as it does to song lyrics. We don’t always hear things perfectly, and even if we do, we have to square the things we’ve just heard with the things we already knew, which provide a guide for our interpretation but may sometimes take us in the wrong direction.

So if you hear someone referring to a really disappointing experience as a damp squib, but are not familiar with squib (an old-fashioned word for a firework), what is to stop you thinking that what you really heard was damp squid? A squid is, after all, a very damp creature, and not always something that people are hugely fond of. Similarly, the expression to nip in the bud makes sense if you latch on to the gardening metaphor it is based on – but if you don’t, well, nipping an undesirable thing in the butt does sound like a very effective way of getting rid of it. So, people who think the expressions really are damp squid and nip in the butt have made a mistake along the lines of “kiss this guy”; the difference is that here they may end up using the new versions in their own speech, and thus pass them on to other speakers. And the process doesn’t have to involve whole expressions: individual words are susceptible to it too, for example midriff becoming mid-rift or utmost becoming up-most.

It’s beautiful, but undeniably damp

Misinterpreted words and expressions like these, which have some kind of new internal logic of their own, are known as eggcorns. This is because egg-corn is exactly how some English speakers have reinterpreted the word acorn, on the basis that acorns are indeed egg-shaped seeds. And the development of a new eggcorn may not involve any mishearing at all, just reinterpretation of one word as another one that sounds exactly the same. Are you expected to toe the line or to tow the line? Are people given free rein or free reign? In each case the two expressions sound identical, and each brings with it some kind of coherent mental image. For the moment, toe the line and free rein are still considered to be the ‘correct’ versions of these idioms, but perhaps in the future that will no longer be the case.

As words and expressions are reinterpreted over time, the language changes little by little: in speech and in writing, people pass on their reinterpretations to one another, in a way which may eventually pass right through the language. The underlying factors producing eggcorns are the same as those producing mondegreens. But unlike the lyrics of “Purple Haze”, words and idioms don’t generally have a fixed author and don’t belong to anybody, meaning that if everyone started calling acorns eggcorns, then that just would be the correct word for them: the previous, now meaningless term acorn would be no more than a historical curiosity, and English as a whole would be very slightly different from how it is now.

So this is how we get from Jimi Hendrix to language change – via mondegreens and eggcorns. Have you spotted any eggcorns in the wild? And how likely do you think they are to catch on and become the new normal?

Werewolves

Werewolves

Hallowe’en will soon be upon us, so it is only right we turn our attention to monsters. Consider the werewolf. It’s a wolf, sort of, as the name indicates, but what’s a were? The usual assumption is that it’s a leftover of an older word meaning ‘man’ that fell completely out of fashion by the 14th century. As a result we have what looks like a compound word, except that one of the parts doesn’t have any meaning on its own. Perhaps not, but that hasn’t stopped people from squeezing some value out of it nonetheless: if a werewolf is a person who turns into a wolf — or at any rate, part person, part wolf — then a were-bear is a mixture of person and bear, and so on down to were-turtles.

Actually, people don’t seem to be that literal-minded when it comes to word meanings, if the various were-creatures in circulation are any evidence. The monster from “Wallace and Gromit: Curse of the Were-Rabbit” is not half-human, half-rabbit, but more just kind of a monster rabbit, with a thicker pelt. (Visually calqued, I suspect, from the not-particularly wolf-like wolfman of the wolfman movies featuring Lon Chaney Jr.)

And were-fleas, to the extent that they exist, appear to be carriers of lycanthropism rather than human/insect conglomerates. None of this is yet reflected in the Oxford English Dictionary’s entry on were– (you need a subscription for that but it’s free if you have a UK public library card!). Give it a few decades more maybe.

Strangely, words for werewolf in other languages share a propensity for being compounds made up of ‘wolf’ plus some other completely opaque element. The first part of Czech vlkodlak is vlk, which means ‘wolf‘, but dlak on its own is not an independent word. (Not in Czech at any rate, but in the related language Slovenian the equivalent word volkodlak is clearly made up of volk ‘wolf’ and dlaka, which means ‘hair’ or ‘fur’.) And the French werewolf, loup-garou, has the word for ‘wolf’ in it (loup), but garou is not an independent word (other than being an unrelated homonym meaning ‘flax-leaved daphne’). That part seems to have been our very own Germanic word werewolf borrowed at an early date (earliest attestation as garwall from the 12th century). Both of these have, like werewolf, given rise to further monstrous hybrids like Czech prasodlak, from prase ‘pig’, or the French cochon-garou.

In fact, Czech and French have gone one step further than English. Though I just wrote that dlak and garou were not words, that was being a bit pedantic. Neither of them are listed in the authoritative Academy dictionaries of Czech and French, but nonetheless they do seem to have split off from their host body, rather as happened — if we can be permitted to mix monster metaphors — to the hero of 1959’s “The Manster (a.k.a The Split)”.

For example, this Czech website tells us about vlkodlaci i jiní dlaci ‘werewolves and other were-creatures’ (dlaci is the plural of dlak), and in French the phrase courir le garou ‘run the garou‘ used, at least, to be in circulation, meaning basically ‘go around at night being a werewolf”. That use in turn apparently spawned a verb garouter, meaning much the same thing. The curse lives on.