Browsed by
Category: English

Is twote the past of tweet?

Is twote the past of tweet?

Have you ever encountered the form twote as a past tense of the verb to tweet? It is something of a meme on Twitter, and a live example of analogy (and its mysteries). However surprising the form may sound if you have never encountered it, it has been the prescribed one for a long time:

Ten years later, the question popped up among a linguisty Twitter crowd, where a poll again elected twote as the correct form:

It is clear that this unusual form replacing tweeted is some sort of form, but why specifically twote? I saw here and there a reference to the verb to yeet, a slang verb very popular on the internet and meaning more or less “to throw”. Rather than a regular form yeeted, the past for to yeet is often taken to be yote. The choice of an irregular form is probably meant to produce a comedic effect.

This, precisely, is analogical production: creating a new form (twote) by extending a contrast seen in other words (yeet/yote). Analogy is a central topic in my research. I have been trying to answer questions such as: How do we decide what form to use ? How difficult is it to guess? How does this contribute to language change?

But first, have you answered the poll?

What is the past tense of “to tweet”?

To investigate further why we would say twote rather than tweeted, I took out my PhD software (Qumin). Based on 6064 examples of English verbs1, I asked Qumin to produce and rank possible past forms of tweet2. To do so, it read through examples to construct analogical rules (I call them patterns), then evaluated the probability of each rule among the words which sound like tweet.

Qumin found four options3: tweeted (/twiːtɪd/), by analogy with 32 similar words, such as greet/greeted; twet (/twɛt/), by analogy with words like meet/met; tweet (/twiːt/) by analogy with words like beat/beat, finally twote (/twəˑʊt/), by analogy with yeet. Figure 1 provides their ranking (in ascending order) according to Qumin, with the associated probabilities.

Twote 0.028 < tweet 0.056 < twet 0.056 < tweeted 0.86
Figure 1. Qumin’s ranking of the probability for potential past forms of to tweet

As we can see, Qumin finds twote to be the least likely solution. This is a reasonable position overall (indeed, tweeted is the regular form), so why would both the official Twitter account and many Twitter users (including several linguists) prefer twote to tweeted?

But Qumin has no idea what is cool, a factor which makes yeet/yote (already a slang word, used on the internet) a particularly appealing choice. Moreover, Qumin has no access to semantic similarity, which could also play a role. Verbs that have similar meanings can be preferred as support for the analogy. In the current case, both speak/spoke and write/wrote have similar pasts to twote, which might help make it sound acceptable. Some speakers seem to be aware of these factors, as seen in the tweet above.

What about usage?

Are most speakers aware of the variant twote and using it? Before concluding that the model is mistaken, we need to observe what speakers actually use. Indeed, only usage truly determines “what is the past of tweet”. For this, I turn to (automatically) sifting through Twitter data.

Speakers must choose between tweeted or twote: what a dilemna !

A few problems: first, the form “tweet” is also a noun, and identical to the present tense of the verb. Second, “twet” is attested (sometimes as “twett”), but mostly as a synonym for the noun “tweet” (often in a playful “lolcat” style), or as a present verbal form, with a few exceptions, usually of a meta nature (see tweets below). I couldn’t find a way to automatically distinguish these from past forms while also managing within the Twitter API limits. Thus, I left out both from the search entirely. This leaves only our two main contestants.


I extracted as many recent tweets containing tweeted or twote as Twitter would let me — around 300 000 tweets twotten between the 26th of August and the 3rd of September. 186777 tweets remained after refining the search4. Of these, less than 5000 contain twote:

There were more than 180000 occurences of tweeted and less than 5000 of twote in the past few days.
Counts of tweets containing either of two possible pasts for the verb “to tweet” in the past few days on twitter (mentions excluded).

As you can see, the tweeted bar completely dwarfs the other one. However amusing and fitting twote may be, and despite @Twitter’s prescription (but conforming with Qumin’s prediction), the regular past form is by far the most used, even on the platform itself, which lends itself to playful and impactful statements. This easily closes this particular English Past Tense Debate. If only it were always this simple!

  1. The English verb data I used includes only the present and past tenses, and is derived from the CELEX 2 dataset, as used in my PhD dissertation and manually supplemented by the forms for “yeet”. The CELEX2 dataset is commercial, and I can not distribute it. []
  2. The code I used for this blog post is available here, but not the dataset itself. Note that for scientific reasons I won’t discuss here, this software works on sounds, not orthography. []
  3. One last possibility has been ignored by this polite software, a form which follows the pattern of sit/sat. I see it used from time to time for its comic effect, but it does not seem at all frequent enough to be a real contestant (and I do not recommend searching this keyword on Twitter). []
  4. Since there has been a lot of discussion on the correct form, I exclude all clear cases of mentions. I count as mentions any occurrences wrapped in quotations, co-occurring with alternate forms, mentioning past tense, or with a hashtag. Moreover, with the forms in –ed, it is likely that the past participle would be identical, but for twote, the past participle could well be twotten. To reduce the bias due to the presence of more past participles in the usage of tweeted, I also exclude all contexts where the word is preceded by the auxiliary forms has, have, had, is, are, was, were, possibly separated by an adverb. []
The Story of Aubergine

The Story of Aubergine

As the University of Surrey’s foremost (and indeed only) blog about languages and how they change, MORPH is enjoyed by literally dozens of avid readers from all over the world. But so far these multitudes have not received an answer to the one big linguistic question besetting modern society. Namely, what on earth is going on with the name of the plant that British English calls the aubergine, but that in other times and places has been called eggplant, melongene, brown-jolly, mad-apple, and so much more? Where do all these weird names come from?

I think the time has finally come to put everyone’s mind at rest. Aubergines may not seem particularly eggy, melonish, jolly or mad, but lots of the apparently diverse and whimsical terms for them used in English and other languages are actually connected – and in trying to understand how, we can get some insight about how vocabulary spreads and develops over time. It turns out that one powerful impulse behind language change is the fact that speakers like to ‘make sense’ of things that do not inherently make sense. What do I mean by that? Stay tuned to find out.

Long purple aubergine

To get one not-so-linguistic point out of the way first, there is no real mystery about eggplant (the word generally used in the US and some other English-speaking countries, dating back to the 18th century), which is not linked to anything else I am talking about here. It is hard to imagine mistaking the large, purple fruit in the photo above for any kind of egg, but that is not the only kind of aubergine in existence. There are cultivars with a much more oval shape, and even ones with white rather than purple skin: pictures like this, showing an imposter alongside some real eggs, make it obvious how the word eggplant was able to catch on.

Small white eggshaped aubergine in an eggbox between two real eggs

Meanwhile, aubergine, which is borrowed from French as you might expect, has a much more complex history, and can be traced back over many centuries, hopping from language to language with minor adjustments along the way. The plant is not native to the US, Britain or France, but to southern or eastern Asia, and investigating the history of the word will eventually take us back in the right geographical direction. Aubergine got into French from the Catalan albergínia, whose first syllable gives us a clue as to where we should look next: as in many al- words in the Iberian peninsula (e.g. Spanish algodón ‘cotton’), it reflects the Arabic definite article. So, along with medieval Spanish alberengena, the Catalan item is from Arabic al-bādhinjān ‘the aubergine’, where only the bādhinjān bit will be relevant from here on. This connection makes sense, because the Arab conquest had such an impact on the history of Iberia. And more generally, we have the Arabs to thank for the spread of aubergine cultivation into the West, and also – indirectly – for this charming illustration in a 14th-century Latin translation of an Arabic health manual:

Illustration featuring three people in front of a stand of aubergine plants
Page from the 14th c. Tacuinum Sanitatis (Vienna), SN2644

But bādhinjān is not Arabic in origin either: it was borrowed into Arabic from its neighbour, Persian. In turn, Persian bādenjān is a borrowing from Sanskrit vātiṅgaṇa… and Sanskrit itself got this from some other language of India, probably belonging to the unrelated Dravidian family. The word for aubergine in Tamil, vaṟutuṇai, is an example of how the word developed inside Dravidian itself.

That is as far back as we are able to trace the word. But the journey has already been quite convoluted. To recap, a Dravidian item was borrowed into Sanskrit, from there into Persian, from there into Arabic, from there into Catalan, from there into French, and from there into English – and in the course of that process, it managed to go from something along the lines of vaṟutuṇai to the very different aubergine, although the individual changes were not drastic at any stage. The whole thing illustrates how developments in language can go with cultural change, in that words sometimes spread together with the things they refer to. In the same way, tea reached Europe via two routes originating in different Chinese dialect zones, and that is what gave rise to the split between ‘tea’-type and ‘chai’-type words in European languages:

[Map created by Wikimedia user Poulpy, licensed CC BY-SA 3.0, cropped for use here]
This still leaves a lot of aubergine words unaccounted for. But now that we have played the tape backwards all the way from aubergine back to something-like-vaṟutuṇai, we can run it forwards again, and see what different historical paths we could follow instead. For example, Arabic had an influence all over the Mediterranean, and so it is no surprise to see that about a thousand years ago, versions of bādhinjān start appearing in Greece as well as Iberia. Greek words could not begin with b- at the time, so what we see instead are things like matizanion and melintzana, and melitzana is the Greek for aubergine to this day. There is no good pronunciation-based reason for the Greek word to have ended up beginning with mel-, but what must have happened is that faced with this foreign string of sounds, speakers thought it would be sensible for it to sound more like melanos ‘dark, black’, to match its appearance. That is, they injected a bit of meaning into what was originally just an arbitrary label.

Meanwhile the word turns up in medieval Latin as melongena (giving the antiquated English melongene) and in Italian as melanzana, and a similar thing happened: here mel- has nothing to do with the dark colour of the fruit, but it did remind speakers of the word for ‘apple’, mela. We know this because melanzana was subsequently reinterpreted as the expression mela insana, ‘insane apple’. To produce this interpretation, it must have helped that the aubergine (like the equally suspicious tomato) belongs to the ‘deadly’ nightshade family, whose traditional European representatives are famously toxic. So, again, something that was originally just a word, with no deeper meaning inside, was reimagined so that it ‘made sense’. As a direct translation, English started calling the aubergine a mad-apple in the 1500s.

Parody of the "Keep Calm and Carry On" posters, reading "You don't have to be mad to work here but it helps"
Poster from a 16th c. aubergine factory

There are many more developments we could trace. For example, I have not talked at all about the branch of this aubergine ‘tree’ that entered the Ottoman Empire and from there spread widely across Europe and Asia. But instead I will return now to the Arab conquest of Iberia. This brought bādhinjān into Portuguese in the form beringela, and then when the Portuguese started making conquests of their own, versions of beringela appeared around the world. Notably, briñjal was borrowed into Gujarati and brinjal into Indian English, meaning that something-like-vaṟutuṇai ultimately came full circle, returning in this heavy disguise to its ancestral home of India. And to end on a particularly happy note, when the same form brinjal reached the Caribbean, English speakers there saw their own opportunity to ‘make sense’ of it – this time by adapting it into brown-jolly.

Brown-jolly is pretty close to the mark in terms of colour, and it is much better marketing than mela insana. But from the linguist’s point of view, they both reinforce a point which has often been made: speakers are always alive to the possibility that the expressions they use are not just arbitrary, but can be analysed, even if that means coming up with new meanings which were not originally there. To illustrate the power of ‘folk etymology’ of this kind, linguists traditionally turn to the word asparagus, reinterpreted in some varieties of English as sparrow-grass. But perhaps it is time for us to give the brown-jolly its moment in the sun.

Yesterday, Today and Tomorrow

Yesterday, Today and Tomorrow

How do we talk about time? This may seem a simple question with a simple answer; we are all human, surely we all experience time the same way? That may be true, but that doesn’t mean that all languages organise the time in the same way. This is arguably most apparent when it comes to talking about the days either side of the present day. We all live on earth and so therefore all experience a day-night cycle; all can understand how one day follows after another. However, the words we use to locate events in this cycle can vary wildly in their construction.

Let’s take a look at two languages, Scottish Gaelic and Sylheti, and see how their systems compare with that of English. All three of these languages belong to the same family, Indo-European, so it might be assumed that they show many similarities. And yet each still exhibits significant variation in how they talk about time.

Firstly, Scottish Gaelic. Like English, it distinguishes between ‘yesterday’, ‘today’ and ‘tomorrow’. The terms each show a consistent structure with a frozen prefix a(n)- with three morphologically opaque roots; an-dè, an-diugh and a-màireach respectively. Furthermore none of the Gaelic terms has any connection with the normal word for ‘day’, latha/là. Compare English, where yester-day and to-day both feature the word ‘day’, while to-day and to-morrow both feature a frozen prefix to- (historically a demonstrative). Additionally, there are also single terms for ‘last night’ as well as ‘tonight’ with a-raoir and a-nochd respectively, again with no immediately apparent connection with the normal term for ‘night’ oidhche. On the other hand, there is no single term for ‘tomorrow night’ so the compound expression oidhche a-màireach is used instead. There are also additional terms for ‘the day after tomorrow’ and ‘the day before yesterday’, an-earar and a bhòn-dè respectively, while the latter has a counterpart in a bhòn-raoir for ‘the night before last’. English is also reported to have had similar terms in the form of ereyesterday and overmorrow, though these have fallen out of usage in the modern day.

Gaelic is also in another respect slightly more regular than English in how it refers to parts of the day. While in English we have a split between ‘this morning’ and ‘yesterday morning’, Gaelic instead uses madainn an-diugh and madainn an-dè, where the former literally translates to ‘today morning’.

But all this is not really that surprising. All that really distinguishes Scottish Gaelic from English in this respect is which time categories are given single indivisible terms rather than compositional expressions; the fundamental organisation of the system is still broadly similar to English. To see a far more radically different system of organising time words, we will now turn to Sylheti, an Indo-Aryan language spoken in north-eastern Bangladesh by around 9-10 million and by perhaps a further 1 million in diaspora, including by most of the British Bangladeshi community.

Here, instead of distinguishing between ‘yesterday’ and ‘tomorrow’, we instead find a single term xail(ku), contrasting with aiz(ku) meaning ‘today’ (the -ku is a suffix which can optionally appear on a lot of ‘time’ words, such as onku ‘now’ or bianku ‘(this) morning’). The two senses of ‘tomorrow’ and ‘yesterday’ can be distinguished by combining them with goto ‘past’ and agami ‘future’, but just as commonly instead the distinction is solely marked by whether the verb is in the past or future tense, e.g. xailku ami amar bondu dexsi ‘I saw my friend yesterday’ vs. xailku ami amar bondu dexmu ‘I will see my friend tomorrow’.

This is not an isolated instance in the language, either, but in fact represents a consistent trend. So in the same manner foru can be either ‘the day before yesterday’ or ‘the day after tomorrow’ depending on context and toʃu the same but at one day further removed.

Table of day and night terms in english, Gaelic and Sylheti respectively
Visualising the systems

Nor is Sylheti unique in using this kind of system; it is also found in many parts of New Guinea, for example. Yimas, a language of northern New Guinea, also uses the same term ŋarŋ for both ‘yesterday’ and ‘tomorrow’, urakrŋ for ‘two days removed’ and so on, all the way up to manmaɲcŋ for ‘five days removed’. Once again whether the reference is to the past or future is carried by the choice of tense on the verb, though Yimas has a far more complex system than that seen in Sylheti, for instance distinguishing a near past -na(n) from a more remote past -ntuk~ntut.

Sylheti also has more fine grained distinctions for parts of the day than either English or Scottish Gaelic. For example, if one wishes to say ‘in the morning’ one must decide whether one is talking about the early morning (ʃoxal) or the mid to late morning (bian). Additionally, while forms such as ‘yesterday/tomorrow afternoon’, ‘the night before last/after next’ and ‘yesteray/tomorrow morning’ use compound expressions (xail madan, foru rait and xail bian/ʃoxal respextively), to express ‘this morning/this afternoon/tonight’ the word for the part of the day (perhaps with the oblique suffix -e or a time suffix -ku) is sufficient by itself, for example amra ʃoxale Sylheʈ aisi ‘We arrived in Sylhet this morning’ or ami raitku dua xotram ‘I am praying tonight’ (with rait ‘night’).

This is just one small part of the temporal vocabulary, and only looking at representatives from a single family, and yet already we see great variation in how time is organised and discussed. It is not so much that these groups have fundamentally different conceptions of time, as these languages share a common ancestor and are only separated by a few thousand years. Instead, it is a testament to the fluidity of time itself, resulting in the words used to refer to it easily shifting in meaning and being reorganised over generations.

What slips of the tongue can tell us about language

What slips of the tongue can tell us about language

“The grouchy knight cuddled the rowdy seer’s adorable puppy while devouring lasagne”

This is probably a sentence you’ve never heard – or produced – before. Yet this experience is not novel – everyday, you make utterances you’ve never heard, and understand new ones.

Producing such utterances is not a trivial matter. To do this we have to generate them – that is, decide on the concept to be expressed, encode that into words and structures, then into the sounds that make up our words before sending instructions to our articulatory apparatus to produce the utterance. All within fractions of a second.

Yet, sometimes we make mistakes, and produce things we didn’t intend to do:

Error (The Mistake we Make) Target (What we had intended to say)
heft lemisphere left hemisphere
squoor squeaky floor
a leading list a reading list
gave the goy gave the boy
stough competition stiff/tough competition
she sliced the knife with a salami she sliced the salami with a knife
a hole full of floors a floor full of holes


We usually notice these errors when we make them and correct ourselves. But rather than being merely slips of tongue, they are a goldmine of information as they demonstrate breakdowns at various parts in the speech production process.

Some of these errors are lexical selection errors – we select the wrong lexical concept or lemma for the message we’re trying to say. That is, we select the wrong word stored in our brains, we pick the wrong word from our mental dictionary. This can be simply the wrong concept, as in: ‘he’s carrying a bag of cherries’ instead of ‘grapes’. Sometimes, we can combine words together in blends: ‘the competition is getting a little stough’ instead of stiff or tough. Other times, we can exchange words within a sentence, as in ‘she sliced the knife with a salami’, rather than ‘she sliced the salami with a knife’.

We can also make phonological errors, that is, errors in the sound structure of our words:

heft lemisphere left hemisphere
fleaky squoor squeaky floor
cheek and ch[ɔː]se Chalk and cheese
enjoyding it enjoying it
cumsily Clumsily
leading list reading list
gave the goy gave the boy


We can look at large data sets, or corpora, to see what kinds of errors are commonly made. We find that these errors are still well-formed in terms of their sound structure, or phonology. 60-90% of errors (depending on the corpus you look at) involve errors with a single sound or segment, and these errors are sensitive to syllable structure. That is, we might swap segments from the same part of the syllable as in exchanges:

face spood < space food

Or we might combine the beginning of one syllable and the end of another:

grool < great + cool

We also like to swap sounds that are similar to each other, so

paid mossible < made possible

is more likely than

two sen pet < two pen set

There are exceptions to these generalisations of course – but they are rare.

Speech errors give us an insight into normal speech production processes. The fact that sound errors occur at all tells us that speech production is a generative process – it is not that we just reproduce fully formed stored sentences, but rather we create each utterance afresh each time. In order to mix or swap two elements, both must be activated at the same point of the production process.

Furthermore, the range of speech across which errors can occur implies that the span of processing is greater than a single word. You might be familiar with spoonerisms, popularised by Dr William Archibald Spooner:

  • You were caught fighting a liar in the quad < You were caught lighting a fire in the quad
  • You have hissed my mystery lectures < You have missed my history lectures
  • You have tasted the whole worm < You have wasted the whole term
    We must plan more than a word ahead for errors like these to happen.

    There is a much wider array of questions we can ask about speech production than can be answered by speech errors, but certainly they are an entertaining place to start.

    The linguistic archaeology of feet

    The linguistic archaeology of feet

    There’s been excitement recently about evidence that humans had set foot in the Americas as much as 22,500 years ago, pushing back the previous best estimate by almost ten thousand years. And by ‘set foot’, I mean literally. The tell-tale new evidence comes to us in the form of imprints left by human feet in a particularly well-preserved mudflat in New Mexico. So far, the humans themselves have not been uncovered by archaeologists, but their characteristic mark upon the mud has endured.

    When linguists peer into the past, we also will occasionally use the imprints, left by something which has otherwise been lost, to infer its presence long ago — all of which brings us to the topic of feet, and not the kind that you’d use to walk across a mudflat, but the literal English word ‘feet’, which itself contains a wonderful imprint of a long-lost vowel.

    Our story begins with the fact that in English, the word ‘feet’ is a little odd. It’s a plural that doesn’t end in ‘s’. As any child will tell you, you can’t get away with saying ‘foots’ for the plural of ‘foot’ for very long before someone bigger than you corrects it to ‘feet’. However, given that most English nouns do use an ‘s’ plural, it’s entirely sensible to ask why ‘feet’ is different. (Of course, ‘feet’ isn’t absolutely unique: English contains a select club of other, similar plurals like ‘geese’ and ‘teeth’, to which we’ll return in a minute.)

    The tale of ‘feet’ begins around two millennia ago, when it was in fact a regular plural word. In proto-Germanic, the singular form would have been ‘fōt-s’ (pronounced approximately as fohts, where ‘ō’ is a long ‘o’ sound) and its corresponding plural ‘fōt-iz’, constructed with a simple plural suffix ‘-iz’. Over the following centuries, the sounds at the end of the plural form were worn away and eventually lost, as often happens during language change. However, before the suffix disappeared entirely, the ‘i’ vowel in it left its imprint on the ‘ō’ vowel, changing it to ‘ȫ’, which is to say ‘fōtiz’ became ‘fōti’ then ‘fȫti’ then ‘fȫt’ which by Old English had become ‘fēt’ and is now ‘feet’. In the meantime, the singular form ‘fōts’, which contained no ‘i’ vowel, changed very little indeed: it lost its suffix ‘-s’, becoming ‘fōt’ and then modern English ‘foot’. A similar story lies behind the plurals ‘geese’ and ‘teeth’: an original suffixal vowel ‘i’ changed ‘ō’ into ‘ȫ’, before disappearing, then ‘ȫ’ became ‘ē’.

    You might say that the ‘i’ vowel left its imprint upon original ‘ō’ in the form of the altered vowel ‘ȫ’. One tool which linguistic archaeologists put to good use, is our knowledge of the characteristic imprints that one sound can leave upon another. In the case of the long-lost ‘i’ vowel, the imprint even has a name, umlaut. Historical umlaut is also what lies behind plurals like ‘mice’ and ‘men’.

    Armed with the background knowledge that lost ‘i’ vowels changed ‘ō’ into ‘ȫ’, and in doing so gave rise to modern English alternations between ‘oo’ and ‘ee’, we can now go fossicking through the vocabulary for more lost ‘i’ vowels. Another suffix that was lost over the centuries was a causative suffix, which related nouns to verbs, such as ‘blood’ to ‘bleed’, or ‘food’ to ‘feed’: as you’ll have guessed, the verbs once contained a now-lost ‘i’. In some cases, pairs of sibling words such as these have grown apart over time. For instance, if you were to decide someone’s fate (or their ‘doom’) then you’d be judging them (or ‘deeming’ them), though as you can see, I had to produce a fairly contrived context to highlight the relatedness of ‘doom’ and ‘deem’.

    Umlaut caused by a now-lost ‘i’ also crops up in several nouns ending in ‘-th’: compare not only ‘strong’ with ‘strength’, ‘long’ with ‘length’, or ‘broad’ with ‘breadth’, but also ‘hale’ with ‘health’ and ‘foul’ with ‘filth’.

    feet made filthy by umlaut!

    Over decades of meticulous work, linguists have uncovered much about how languages around the world change over time, though much more still remains to be accounted for. One of the many lingering questions is what the conditions are, which favour the continued survival of idiosyncratic word forms like ‘feet’, long after they have lost their regularity. We know that many irregular words, such as the Old English plural ‘bēc’ for ‘books’ (corresponding to singular ‘bōc’), get removed over time, yet others persist for millennia. It’s an ongoing task for linguists to understand why some footprints remain while others get washed away.

    SMG – I’d Arapaho, Roon, Sala, Tubar and Nara, but alas no Oroha paradigms

    SMG – I’d Arapaho, Roon, Sala, Tubar and Nara, but alas no Oroha paradigms

    A palindrome is a linguistic delight: it reads the same in both directions. For example: level. Or Anna, or indeed Hannah. This is a visual trick: if you record yourself saying one of these words and play the recording backwards, it won’t sound exactly the same.

    Palindromes hit the big time in the parrot sketch. They were also promoted by ABBA, with their top hit SOS!

    Here’s a nice one from North Ambrym (an Oceanic language spoken in Vanuatu): rrirrirr ‘sound a rat makes when you try and kill it but you miss it’. And a long one from Estonian: kuulilennuteetunneliluuk ‘bullet flying trajectory tunnel’s hatch’. I’m not sure that one is used much (except in blogs about palindromes).

    We can go up a level (!), as it were, to palindromic phrases. A famous one of these is:

    A man, a plan, a canal – Panama!

    This has been around at least since 1948. It has often been extended, as in this version due to Guy Jacobson:

    A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!

    And here’s a Russian sentence palindrome: Рислинг сгнил, сир. ‘the Riesling has gone off, sir’ More Russian palindromes at For French sentence palindromes go to And there are even songs based on such palindromes:

    They have palindromes in American Sign Language:

    Not surprisingly, palindromes don’t translate. Though we can go up another level (!) of cleverness, to the bilingual palindrome: I love / e voli. This is half English, half Italian, and overall a palindrome. More of these at It’s truly amazing what people can create, including whole poems as palindromes:

    Some time ago, I mentioned to linguist colleagues that Malayalam (a Dravidian language of southern India) is a palindromic language. One colleague’s eyes opened wide, and he asked whether it was palindromic at the word level or the sentence level. What a great idea! Of course, it’s just the name which is a palindrome (just as Anna is a palindrome but that doesn’t make Anna a palindromic person – there are deep issues here: what does a name refer to?).

    It turns out that there are over seventy “palindromic languages”, including some that are central to our research in SMG, notably Iaai (spoken in New Caledonia). Here are some more: Efe, Ewe, and Atta.

    What then of E (also called Wuse/Wusehua), a Tai-Chinese mixed language, of Guangxi, China? Yes, it’s a palindrome, just not a very impressive one. Just as the English pronoun I is a palindrome, though hardly one to get excited about (unless you’re called Anna or Hannah of course). But it gets much better. You may have noticed that linguists increasingly give three letter codes after language names. These are the ISO codes that we use to uniquely identify a language, to make sure that we’re talking about, say, the language Aja (a Nilo-Saharan language of Sudan), ISO code aja, and not Aja (a Niger-Congo language of Benin), ISO code ijg. So, what is the ISO code for the language E? It’s eee. The language name and the code are both palindromes! Similarly there’s U (an Austroasiatic language of the Yunnan Province of China), ISO code uuu.

    Here are the languages which are doubly palindromic (name and ISO code):

    Name ISO code
    E eee
    Efe efe
    Ewe ewe
    Iaai iai
    Kerek krk
    Naman lzl
    Mam mam
    Nen nqn
    Ofo ofo
    Ososo oso
    Utu utu
    U uuu
    Yoy yoy

    A real star is Naman, whose ISO code is quite different, lzl, but still palindromic. Where does that come from? Well, the language has an alternative name, Litzlitz, so when it’s not a palindrome it’s a reduplication!

    Back to the tricky use of “palindromic language”. Iaai is a palindromic name. As we’ve seen, its ISO code iai is also a palindrome. And the language does have some very nice palindromes:

    • aba ‘caress’
    • ee ‘locative – near the interlocuter’
    • ii ‘to suck’
    • iei ‘to hurt, cause pain’
    • ikiiki ‘repugnant’
    • iwi ‘rudder’
    • komok ‘sick’
    • maam ‘your manner’
    • mem ‘Napolean fish (Cheilinus undulatus)’
    • omoomo ‘women’
    • nokon ‘his/her infant’
    • oṇo ‘Barracuda (Sphyraena sp.)’
    • öö ‘spear’
    • ölö ‘mount, embark, disembark’
    • ölö ‘legume (Pueraria sp.)’
    • u ‘an old word for yam’
    • uu ‘fall from a height, chop down (of tree)’
    • ûû ‘a dispute, to dispute’
    • ûcû ‘similar, same’ (a nice meaning for a palindrome!)
    • ûcû ‘to exchange, buy, shop’

    It would be impressive if you could read this post backwards, and have it make sense. But that wouldn’t be a BLOG but a GLOB, the latter being is an instance of a Semordnilap, but that is another story. For now, we welcome your favourite palindromes, in any language, in the comments.

    For examples, thanks to Jenny Audring, Sacha Beniamine, Marina Chumakina, Mike Franjieh, Erich Round and Anna Thornton, and for the title (you’ve guessed what sort of title that is!), thanks to Steven Kaye.

    A “let’s circle back” guy

    A “let’s circle back” guy

    As everyone knows by now, for the foreseeable future we must all stay at home as much as possible to slow the spread of COVID-19 and reduce the burden on our health services – which has already been substantial, and will soon be enormous even in the best possible scenario.

    This shift in the way we operate as a society will have a wide range of effects on our lives, which are already being noticed. Some of these were the kind of thing you might have thought of in advance – but others less so. For example, soon after the advice to work from home really started to bite in the US, a substantial thread developed on Twitter, all started off by the following tweet:

    The thousands of responses that appeared within a few hours of this tweet shows how deeply it resonated: many people must have been through their own version of the same surprising experience, some of them presumably in the last few days. But what happened here, and why was it so surprising? And why, as a linguist, am I sitting at home and writing a blog post about it now?

    This single tweet, which people found so easy to identify with, in fact brings together a number of issues that linguists are interested in. For one thing, it works as a clear illustration of a point that people intuitively appreciate, but which has endless ramifications: the language you use is never just an instrument for communicating your thoughts, but is also taken to say something important about your identity, whether you intend it to or not. If a guy uses the expression “let’s circle back”, meaning to return to an issue later, that makes him a “let’s circle back” guy – that is, a particular kind of person. In a jokey way, the tweeter is implying that she already had a mental category of ‘the kind of person who would say things like that’, and she takes it for granted that we do too. In this case, the surprise for Laura Norkin was in suddenly discovering that her own husband belonged in that pre-existing category: the way she tells it, hearing him use a specific turn of phrase counted as finding out important new information about who he is as a person, which she was not necessarily best pleased about.

    Making a linguistic choice: a bilingual road sign in Wales

    Since the mid-twentieth century, the field of sociolinguistics has drawn attention to the fact that this kind of thing is going on everywhere in language. Consciously or unconsciously, people are making linguistic choices all the time – whether that means choosing between two totally different languages, between two different expressions with the same meaning (do you circle back to something or just return to it?), or between two very slightly different pronunciations of the same word. Any of these choices might turn out to ‘say something’ about how you see yourself – or how other people see you. And the social meanings and values assigned to the different choices are likely to change over time: so understanding what is going on with one person’s use of language really requires you to understand what is going on right across the community, which is like an ecosystem full of co-existing language diversity. How do linguistic developments, and the social responses to them, propagate and interact in this ecosystem? That’s something that researchers work hard to find out.

    The tweet also picks up on the importance of the situational context for the way people use language. Laura Norkin had never heard her husband use the offending expression before because it belongs to a particular register – meaning a variety of language which is characteristic of a particular sphere of activity. Circling back is characteristic of ‘full work mode’, something which had never previously needed to surface in the domestic setting.

    Why do registers exist? Partly it must be to do with the fact that different people know different things: for example, lawyers can expect to be able to use technical legal terminology with their colleagues, but not with their clients, even if they are talking about all the same issues – because behind the terminology there lies a wealth of specialist knowledge. Similarly, anyone would modify their language when talking to a five-year-old as opposed to a fifty-year-old.

    But this cannot be the whole story: it doesn’t help you to explain the difference between returning and circling back. Should we think of the business/marketing/management world, where terms like circling back are stereotypically used, as a mini community within the community, with its own ideas of what counts as normal linguistic practice? Or is everyone involved giving a signal that they take on a new, businesslike identity when they turn up to the office – even if these days that doesn’t involve leaving the house? Again, working out the relationship between the language aspect and the social aspect here makes an interesting challenge for linguistics.

    The medical profession is well known for having its own technical register

    But this was not just an anecdote about how unusual it is to be at home and yet hear terms that usually turn up at work. We can tell that “let’s circle back”, just like other commonly mocked corporate expressions such as “blue-sky thinking” or “push the envelope”, is something we are expected to dislike – but why? The existence of different registers is not generally thought of as a bad thing in itself. You could give the answer that this expression is overused, a cliché, and thus sounds ugly. But really, things must be the other way round: English abounds in commonly used expressions, and only the ones that ‘sound ugly’ get labelled as overused clichés. And there is nothing inherently worse about circle back than about re-turn – in fact, when you think about it, they are just minor variations on the same metaphor.

    So what is really going on here? The popular reaction to circle back, and other things of that kind, seems to involve lots of factors at once. The expression is new enough that people still notice it; but it is not unusual enough to sound novel or imaginative. It is currently restricted to a particular kind of professional setting that most people never find themselves in; but it does not refer to a complex or specific enough concept to ‘deserve’ to exist as a technical term. And we do not tend to worry too much about making fun of the linguistic habits of people who have a relatively privileged position in society: certainly, teasing your husband by outing him as a “let’s circle back” guy is not really going to do him any harm.

    Spelling it out like this helps to suggest just how much information we are factoring in whenever we react to the linguistic behaviour of the people around us – and this is something we do all the time, mostly without even noticing. We are social beings, and cannot help looking for the social message in the things people say, as well as the literal message: establishing this fact, and working out how to investigate it scientifically, has been one of the great overarching projects of modern linguistics. Right now, for everyone’s benefit, we need to learn how to be less sociable than ever. But as the tweet above suggests, people’s inbuilt sensitivity to language as a social code is not going to change any time soon.

    Cushty Kazakh

    Cushty Kazakh

    With thousands of miles between the East End of London and the land of Kazakhs, cushty was the last word one expected to hear one warm spring afternoon in the streets of Astana (the capital of Kazakhstan, since renamed Nur-Sultan). The word cushty (meaning ‘great, very good, pleasing’) is usually associated with the Cockney dialect of the English language which originated in the East End of London.

    Del Boy from Only Fools and Horses
    Del Boy from Only Fools and Horses

    Check out Del Boy’s Cockney sayings (Cushty from 4:04 to 4:41).

    Cockney is still spoken in London now, and the word is often used to refer to anyone from London, although a true Cockney would disagree with that, and would proudly declare her East End origins. More specifically, a true ‘Bow-bell’ Cockney comes from the area within hearing distance of the church bells of St. Mary-le-Bow, Cheapside, London.

    Due to its strong association with modern-day London, the word ‘Cockney’ might be perceived as being one with a fairly short history. This could not be further from the truth as its etymology goes back to a late Middle English 14th century word cokenay, which literally means a “cock’s egg” – a useless, small, and defective egg laid by a rooster (which does not actually produce eggs). This pejorative term was later used to denote a spoiled or pampered child, a milksop, and eventually came to mean a town resident who was seen as affected or puny.

    The pronunciation of the Cockney dialect is thought to have been influenced by Essex and other dialects from the east of England, while the vocabulary contains many borrowings from Yiddish and Romany (cushty being one of those borrowings – we’ll get back to that in a bit!). One of the most prominent features of Cockney pronunciation is the glottalisation of the sound [t], which means that [t] is pronounced as a glottal stop: [ʔ]. Another interesting feature of Cockney pronunciation is called th-fronting, which means that the sounds usually induced by the letter combination th ([θ] as in ‘thanks’ and [ð] as in ‘there’ are replaced by the sounds [f] and [v]. These (and some other) phonological features characteristic of the Cockney dialect have now spread far and wide across London and other areas, partly thanks to the popularity of television shows like “Only Fools and Horses” and “EastEnders”.

    As far as grammar is concerned, the Cockney dialect is distinguished by the use of me instead of my to indicate possession; heavy use of ain’t in place of am not, is not, are not, has not, have not; and the use of double negation which is ungrammatical in Standard British English: I ain’t saying nuffink to mean I am not saying anything.

    Having borrowed words, Cockney also gave back generously, with derivatives from Cockney rhyming slang becoming a staple of the English vernacular. The rhyming slang tradition is believed to have started in the early to mid-19th century as a way for criminals and wheeler-dealers to code their speech beyond the understanding of police or ordinary folk. The code is constructed by way of rhyming a phrase with a common word, but only using the first word of that phrase to refer to the word. For example, the phrase apples and pears rhymes with the word stairs, so the first word of the phrase – apples – is then used to signify stairs: I’m going up the apples. Another popular and well-known example is dog and bone – telephone, so if a Cockney speaker asks to borrow your dog, do not rush to hand over your poodle!

    Test your knowledge of Cockney rhyming slang!

    Right, so did I encounter a Cockney walking down the field of wheat (street!) in Astana saying how cushty it was? Perhaps it was a Kazakh student who had recently returned from his studies in London and couldn’t quite switch back to Kazakh? No and no. It was a native speaker of Kazakh reacting in Kazakh to her interlocutor’s remark on the new book she’d purchased by saying күшті [kyʃ.tɨˈ] which sounds incredibly close to cushty [kʊˈʃ.ti]. The meanings of the words and contexts in which they can be used are remarkably similar too. The Kazakh күшті literally means ‘strong’, however, colloquially it is used to mean ‘wonderful, great, excellent’ – it really would not be out of place in any of Del Boy’s remarks in the YouTube video above! Surely, the two kushtis have to be related, right? Well…

    Recall, that cushty is a borrowing from Romany (Indo-European) kushto/kushti, which, in turn, is known to have borrowed from Persian and Arabic. In the case of the Romany kushto/kushti, the borrowing could have been from the Persian khoši meaning ‘happiness’ or ‘pleasure’. It would have been very neat if this could be linked to the Kazakh күшті, however, there seems to be no connection there… Kazakh is a Turkic language and the etymology of күшті can be traced back to the Old Turkic root küč meaning ‘power’, which does not seem to have been borrowed from or connected with Persian. Certainly, had we been able to go back far enough, we might have found a common Indo-European-Turkic root in some Proto-Proto-Proto-Language. As things stand now, all we can do is admire what appears to be a wonderful coincidence, and enjoy the journeys on which a two-syllable word you’d overheard in the street might take you.



    Courtesy of

    Those who have out of desire have chosen to or out of dire necessity been forced to bake their own bread may have encountered the term poolish. It refers to a semi-liquid pre-ferment used in bread-making, a mixture of half water and half white flour mixed with a teeny bit of yeast and allowed to slowly ferment for several hours, up to a day, before mixing up the final dough.

    The word itself is an exceedingly odd one, and has been the source of much head-scratching and inconclusive speculation among bread-bakers across the world: it looks like the English word Polish, but is spelled funny, and anyway seems to be borrowed from French, where the spelling would be funnier still. Most discussions of the technique include the obligatory etymological digression, usually fantastical, involving journeymen Polish bakers fanning out over Europe. Linguists too have gotten on the trail: David Gold’s Studies in Etymology and Etiology (2009) devotes a whole page to the question, but does not get too far.

    In its current form it is technical jargon from French commercial baking, and has probably made its way to a broader public through Raymond Calvel’s influential Le gout du pain (‘The taste of bread’) from 1990. In his account:

    This method of breadmaking was first developed in Poland during the 1840s, from whence its name. It was then used in Vienna by Viennese bakers, and it was during this same period that it became known in France. (2001 edition translated by Ronald Wirtz)

    This explanation has been widely accepted, and appears in one form or another in any number of bread-baking books. But how could it even be true? The first problem is the word itself. Poolish is not the French word for Polish, and doesn’t much look a French word anyway. In earlier French texts it crops as pouliche, which looks more French and is indeed the word for a young mare, whose connection to bread dough is tenuous at best. But earlier French texts also have the spelling poolisch or polisch, which looks rather more German than French and suggests we follow the Viennese trail instead.

    This thread of inquiry has its own potential hiccoughs. The German word for Polish is polnisch, with an [n], so would this not just be fudging things? Actually not: polisch, poolischpohlisch or pollisch turn up often enough in older texts as alternative words for ‘Polish’, particularly in southern varieties of German that include Austria. And it is exactly in these form that we find it being used to refer to this particular process, juxtaposed with Dampfl (or Dampfel or Dampel), the term in southern Germany and Austria for a rather stiffer pre-ferment which goes through a shorter rising period, as in these two examples from 1865, one from Leopold Wimmer’s self-published advertising advertising screed for St. Marxer brand (of Vienna) pressed yeast, where it turns up as Pohlisch:

    the other from Ignaz Reich’s (of Pest, as in Budapest) account of ancient Hebrew baking practices, where it’s rendered as pollisch.

    The term polisch (in all its variants) in this sense seems to have died a natural death in German, only to reemerge during the current craft-baking revival in the guise of poolish.

    But if poolish was originally the (or a) German word for Polish, we run up against the sticky question of what it was actually referring to. Calvel repeats the story that this technique was invented by Polish bakers (which turns up in a 1972 article in The Atlantic Monthly, I think anyway, because it’s but coyly revealed by Google in snippet view), a supposition which lacks as much plausibility as it does historical attestation. Poland has traditionally been a land of sourdough rye bread. Is seems unlikely that a novel technique involving the use both of white wheat flour and commercial pressed yeast (a relatively new product) would have been devised there and introduced into the imperial capital that was Vienna. So what on earth could it have meant?

    Here I make my own foray into speculation; you read it here first. Poland is not just a land of sourdough rye bread, it is a land of a soup made from rye sourdough: żur or żurek (itself derived from sur, one variant of the German word for ‘sour’), still widely consumed and also sold in ready form form for time-strapped gourmands. Since the Austro-Hungarian Empire included much of what had once been Poland, it isn’t too far-fetched to think that people in Vienna might have been familiar with this soup. And since the salient characteristic of poolish is that it is basically liquid, in opposition to more solid doughs, my guess is that the term poolish arose as a facetious allusion to żur: a soup-like fermenting dough mixture, like the thinned-out sourdough soup that Poles eat.

    This theory has the minor drawback of lacking any positive evidence in its favor. So far the only 19th century reference to żur outside of its normal context that I have been able to find is as a cure for equine distemper, otherwise known as ‘strangles’. That leads us into the topic of pluralia tantum disease names…

    Sense and polarity, or why meaning can drive language change

    Sense and polarity, or why meaning can drive language change

    Generally a sentence can be negative or positive depending on what one actually wants to express. Thus if I’m asked whether I think that John’s new hobby – say climbing – is a good idea, I can say It’s not a good idea; conversely, if I do think it is a good idea, I can remove the negation not to make the sentence positive and say It’s a good idea. Both sentences are perfectly acceptable in this context.

    From such an example, we might therefore conclude that any sentence can be made positive by removing the relevant negative word – most often not – from the sentence. But if that is the case, why is the non-negative response I like it one bit not acceptable, odd when its negative counterpart I don’t like it one bit is perfectly acceptable and natural?

    This contrast has to do with the expression one bit: notice that if it is removed, then both negative and positive responses are perfectly fine: I could respond I don’t like it or, if I do like it, I (do) like it.

    It seems that there is something special about the phrase one bit: it wants to be in a negative sentence. But why? It turns out that this question is a very big puzzle, not only for English grammar but for the grammar of most (all?) languages. For instance in French, the expression bouger/lever le petit doigt `lift a finger’ must appear in a negative sentence. Thus if I know that John wanted to help with your house move and I ask you how it went, you could say Il n’a pas levé le petit doigt `lit. He didn’t lift the small finger’ if he didn’t help at all, but I could not say Il a levé le petit doigt lit. ‘He lifted the small finger’ even if he did help to some extent.

    Expressions like lever le petit doigt `lift a finger’, one bit, care/give a damn, own a red cent are said to be polarity sensitive: they only really make sense if used in negative sentences. But this in itself is not the most interesting property.

    What is much more interesting is why they have this property. There is a lot of research on this question in theoretical linguistics. The proposals are quite technical but they all start from the observation that most expressions that need to be in a negative context to be acceptable are expressions of minimal degrees and measures. For instance, a finger or le petit doigt `the small finger’ is the smallest body part one can lift to do something, a drop (in the expression I didn’t drink a drop of vodka yesterday) is the smallest observable quantity of vodka, etc.

    Regine Eckardt, who has worked on this topic, formulates the following intuition: ‘speakers know that in the context of drinking, an event of drinking a drop can never occur on its own – even though a lot of drops usually will be consumed after a drinking of some larger quantity.’ (Eckardt 2006, p. 158). However the intuition goes, the occurrence of this expression in a negative sentence is acceptable because it denies the existence of events that consist of just drinking one drop.

    What this means is that if Mary drank a small glass of vodka yesterday, although it is technically true to say She drank a drop of vodka (since the glass contains many drops) it would not be very informative, certainly not as informative as saying the equally true She drank a glass of vodka.

    However imagine now that Mary didn’t drink any alcohol at all yesterday. In this context, I would be telling the truth if I said either one of the following sentences: Mary didn’t drink a glass of vodka or Mary didn’t drink a drop of vodka. But now it is much more informative to say the latter. To see this consider the following: saying Mary didn’t drink a glass of vodka could describe a situation in which Mary didn’t drink a glass of vodka yesterday but she still drank some vodka, maybe just a spoonful. If however I say Mary didn’t drink a drop of vodka then this can only describe a situation where Mary didn’t drink a glass or even a little bit of vodka. In other words, saying Mary didn’t drink a drop of vodka yesterday is more informative than saying Mary didn’t drink a glass of vodka yesterday because the former sentence describes a very precise situation whereas the latter is a lot less specific as to what it describes (i.e. it could be uttered in a situation in which Mary drank a spoonful of vodka or maybe a cocktail that contains 2ml of vodka, etc)

    By using expressions of minimal degrees/measures in negative environments, the sentences become a lot more informative. This, it seems, is part of the reason why languages like English have changed such that these words are now only usable in negative sentences.