Browsed by
Author: Sacha Beniamine

Is twote the past of tweet?

Is twote the past of tweet?

Have you ever encountered the form twote as a past tense of the verb to tweet? It is something of a meme on Twitter, and a live example of analogy (and its mysteries). However surprising the form may sound if you have never encountered it, it has been the prescribed one for a long time:

https://twitter.com/Twitter/status/47851852070522880?s=20

Ten years later, the question popped up among a linguisty Twitter crowd, where a poll again elected twote as the correct form:

It is clear that this unusual form replacing tweeted is some sort of form, but why specifically twote? I saw here and there a reference to the verb to yeet, a slang verb very popular on the internet and meaning more or less “to throw”. Rather than a regular form yeeted, the past for to yeet is often taken to be yote. The choice of an irregular form is probably meant to produce a comedic effect.

This, precisely, is analogical production: creating a new form (twote) by extending a contrast seen in other words (yeet/yote). Analogy is a central topic in my research. I have been trying to answer questions such as: How do we decide what form to use ? How difficult is it to guess? How does this contribute to language change?

But first, have you answered the poll?

What is the past tense of “to tweet”?

To investigate further why we would say twote rather than tweeted, I took out my PhD software (Qumin). Based on 6064 examples of English verbs1, I asked Qumin to produce and rank possible past forms of tweet2. To do so, it read through examples to construct analogical rules (I call them patterns), then evaluated the probability of each rule among the words which sound like tweet.

https://twitter.com/cavaticat/status/1212056421082251265

Qumin found four options3: tweeted (/twiːtɪd/), by analogy with 32 similar words, such as greet/greeted; twet (/twɛt/), by analogy with words like meet/met; tweet (/twiːt/) by analogy with words like beat/beat, finally twote (/twəˑʊt/), by analogy with yeet. Figure 1 provides their ranking (in ascending order) according to Qumin, with the associated probabilities.

Twote 0.028 < tweet 0.056 < twet 0.056 < tweeted 0.86
Figure 1. Qumin’s ranking of the probability for potential past forms of to tweet

As we can see, Qumin finds twote to be the least likely solution. This is a reasonable position overall (indeed, tweeted is the regular form), so why would both the official Twitter account and many Twitter users (including several linguists) prefer twote to tweeted?

But Qumin has no idea what is cool, a factor which makes yeet/yote (already a slang word, used on the internet) a particularly appealing choice. Moreover, Qumin has no access to semantic similarity, which could also play a role. Verbs that have similar meanings can be preferred as support for the analogy. In the current case, both speak/spoke and write/wrote have similar pasts to twote, which might help make it sound acceptable. Some speakers seem to be aware of these factors, as seen in the tweet above.

What about usage?

Are most speakers aware of the variant twote and using it? Before concluding that the model is mistaken, we need to observe what speakers actually use. Indeed, only usage truly determines “what is the past of tweet”. For this, I turn to (automatically) sifting through Twitter data.

Speakers must choose between tweeted or twote: what a dilemna !

A few problems: first, the form “tweet” is also a noun, and identical to the present tense of the verb. Second, “twet” is attested (sometimes as “twett”), but mostly as a synonym for the noun “tweet” (often in a playful “lolcat” style), or as a present verbal form, with a few exceptions, usually of a meta nature (see tweets below). I couldn’t find a way to automatically distinguish these from past forms while also managing within the Twitter API limits. Thus, I left out both from the search entirely. This leaves only our two main contestants.

 

I extracted as many recent tweets containing tweeted or twote as Twitter would let me — around 300 000 tweets twotten between the 26th of August and the 3rd of September. 186777 tweets remained after refining the search4. Of these, less than 5000 contain twote:

There were more than 180000 occurences of tweeted and less than 5000 of twote in the past few days.
Counts of tweets containing either of two possible pasts for the verb “to tweet” in the past few days on twitter (mentions excluded).

As you can see, the tweeted bar completely dwarfs the other one. However amusing and fitting twote may be, and despite @Twitter’s prescription (but conforming with Qumin’s prediction), the regular past form is by far the most used, even on the platform itself, which lends itself to playful and impactful statements. This easily closes this particular English Past Tense Debate. If only it were always this simple!

  1. The English verb data I used includes only the present and past tenses, and is derived from the CELEX 2 dataset, as used in my PhD dissertation and manually supplemented by the forms for “yeet”. The CELEX2 dataset is commercial, and I can not distribute it. []
  2. The code I used for this blog post is available here, but not the dataset itself. Note that for scientific reasons I won’t discuss here, this software works on sounds, not orthography. []
  3. One last possibility has been ignored by this polite software, a form which follows the pattern of sit/sat. I see it used from time to time for its comic effect, but it does not seem at all frequent enough to be a real contestant (and I do not recommend searching this keyword on Twitter). []
  4. Since there has been a lot of discussion on the correct form, I exclude all clear cases of mentions. I count as mentions any occurrences wrapped in quotations, co-occurring with alternate forms, mentioning past tense, or with a hashtag. Moreover, with the forms in –ed, it is likely that the past participle would be identical, but for twote, the past participle could well be twotten. To reduce the bias due to the presence of more past participles in the usage of tweeted, I also exclude all contexts where the word is preceded by the auxiliary forms has, have, had, is, are, was, were, possibly separated by an adverb. []
Word games

Word games

You have very certainly heard about Wordle, the viral word game by powerlanguage, recently bought by the NYT. In the original game, a 5-letter English word is secretly chosen every day, which players attempt to guess in 6 tries. Each guess is answered by colored cues: green for “correct letter in the correct place”, orange for “correct letter in the wrong place”, gray for “incorrect letter”. The concept of wordle is not new, and resembles games such as Jotto, Lingo, and mastermind.

 A sample game of Mastermind.
A sample game of Mastermind.

While some may have been annoyed by the endless stream of three-color square emojis reporting players’ success and inundating social media I have been delighted by the productivity displayed by the many variants: in hello wordl, play an endless number of games; in dordle, quordle, octodle guess several words at once; in squardle, play in two dimensions; in nerdle, guess a mathematical formula; in absurdle, the games does its best to get away from your guesses, etc.

Quordle lets you play 4 games at once
Quordle lets you play 4 games at once

Some derived games transform the game mechanics, but the simplest variation is to switch the vocabulary (have you tried queerdle or lordle of the rings?) or the language. Indeed, wikipedia already references more than 40 wordle language variants. If I believe my social feeds, many linguists have found that they were able to play in languages that they didn’t speak, provided that they had some intuitions of the phonotactics and orthographic sequences. I was however quite disappointed to see that many versions retained the English-centric 1-letter:1-unicode-character, and avoided diacritics altogether, leading to strange impoverished typography — this is the case for example of the French wordle, “le mot”.

 

The French wordle accepts "meler", but not "melez"
The French wordle accepts “meler”, but not “melez”

 

While playing variants, I realized that a wordle is only as good as its word list: some games rely on lexicons which contain only citation forms (infinitives for French verbs) and exclude the many others inflected forms, leading to a frustrating game experience. For example, in Le Mot, one can play mêler (or more exactly, meler) “to mix”, but not meles “(you) mix”. It happens that well curated words lists including inflected variant is a Surrey Morphology Group specialty: lexicons and dictionaries are a common product of language documentation, and as its names indicates, researchers at the SMG have a particular focus on morphology. We have been maintaining open inflectional databases since the 90s. After discussion, we agreed collectively to start by producing two wordle-like games, corresponding to the two main lexicons in the SMG databases, respectively the Dictionary of Archi and the Nuer Lexicon.

Nuerdle interface
SMG wordle in Nuer: Nuerdle

The Nuer language, or Thok Nath, is a West Nilotic language spoken by approximately 900,000 to two million people in South Sudan and Ethiopia, as well as in diaspora communities throughout the world. The SMG has created an interactive online dictionary for it. From this lexicon, I have extracted 6218 words, mostly verbs and nouns, with a few other part of speech represented. All targets are taken from this set of words. However, using only the lexicon would risk rejecting a lot of words the speakers might know, even though they are not documented in the lexicon. Thus, I also extracted all of the words from the Nuer translation of the Bible1. This led to a total lexicon of 13476 words2.

Archidle interface
SMG wordle in Archi: Archidle

Archi is a Daghestanian language of the Lezgic group spoken by about 1200 people in Daghestan. At the SMG, we created a dictionary of Archi, with entries in Russian, English, and Nuer (both orthographic and phonetic forms), from which I extracted 3626 words for our wordle puzzle. For now, we do not have any more words for Archi, but we are working on it. In the game, we have ignored the stress diacritics, which might not be intuitive enough for speakers.

Two Nuer Keyboards. On the left, from a mobile app. On the right, our keyboard.
Nuer keyboards: from a mobile app (left), or from our wordle game (right).

In order to create the SMG wordles, I started from the open source code of the re-playable version, hello wordle. In order to keep the game closer to its original, I removed the re-playable function. However, I did keep the option to play a range of word length from 4 to 7 letters. Each day, you can thus play 4 games in each language.  A main challenge was that the Nuer orthography comprises diacritics, which required rewriting large parts of the game, as it previously assumed that each letter could be written with a single character. Another difficulty came from the fact that neither language has a unique, widely used, keyboard layout. For Nuer, we created one based on a mobile keyboard, which we extended to include more diacritics.

Two Cyrillic Keyboards. On the left, standard Russian layout. On the right, our keyboard for Archi.
Cyrillic keyboards: Russian keyboard from a mobile app (left), or Archi keyboard from our wordle game (right).

In both cases, we strove to make the game playable by learners, linguists, and curious people who do not speak Archi or Nuer. For this reason, we made the default word length 4 letters rather than 5, to make the game easier. Moreover, we added short English definitions for all words in our lexicons, with links to their full definitions in our resources. Words in Nuer from the bible are not always present in our Nuer lexicon, and hence, some words in Nuer can appear without translations. Finally, in order to help beginners get started, we provide a few example words of the correct length each day, hidden by default, which can be used to start playing.

Ri̱et: "word" in Nuer
A word played in Nuerdle, with translation in the margin

Besides learning the languages, scouring the dictionary, or using the words given as hints daily, how can you get better at the Nuer or Archi wordle ? It helps to pay attention to the frequency of each letters, and try to play words with frequent letters, in order to reduce the pool of potential words quickly. For the English wordle, some have calculated the optimal starting word. Rather than risk spoiling the game, I provide below the relative frequencies of each of the 5 most frequent letters, for each position (1 to 7) in Nuerdle and Archidle words. This should give an idea of frequent letters at each position. The colors are assigned according to overall frequency in the lexicon, with light greens more frequent than dark blues. Each bar represents the frequencies of the five most frequent letters in a word position (from 1 to 7), ignoring the other, less frequent letters. Each stacked colored bar’s height, between two white lines, represents the letter’s frequency: eg. in Nuer, a word in our lexicon starts with k around 10% of the time, and with around 12% of the time. If there is some interest, a future blog post could explore further the frequent sequences and letter patterns in either languages.

Frequency of each character in Nuer words in our lexicon, per positon
Frequency of each character in Archi words in our lexicon, per positon

Finally, since this is a morphology blog, I would like to draw your attention to the interesting way in which English acquired a new -dle suffix. The original game is called wordle, a combination of the creator’s last name Wardle, and of word. As the game became viral, the apparent suffix has come to mean “game in the wordle family” (or maybe “online guessing game”). Interestingly, even though the most obvious decomposition of wordle seems to be word+le, the productive suffix is -dle, not -le. Could this be because the family resemblance in the new words is more obvious by keeping more common material ? Isn’t analogy mysterious? In any cases, after hesitating with ri̱etle (from ri̱et “word”+le, in Nuer) and č’atle (from č’at, “word” in Archi), we settled instead on calling our games Archidle and Nuerdle.

 

  1. excluding words starting with a capital, in order to avoid proper names. []
  2. If you want to suggest missing Nuer words, the Nuer lexicon has a module for suggestions ! []