Archi – MORPH

Linguistic fieldwork in the Russian Federation

February 9, 2023 Marina Chumakina Comments 1 comment

Surrey Morphology Group, despite being a relatively small research group, nevertheless conducts linguistic fieldwork on all (inhabited) continents. Countries where members have worked over the years include Australia, Bulgaria, Canada, Colombia, Kenya, Mexico, Namibia, Nepal, Nigeria, Russia, Serbia, and Vanuatu. Fieldwork in every region has its peculiarities, not necessarily connected to the linguistic properties of the language(s) studied, and it is the peculiarities of one such region which I would like to discuss today.

My personal fieldwork experience has involved several different regions of Russia, in the republics of Daghestan, Mari-El, Komi and Khakassia. Each of these regions has been fascinating in its own way, but Daghestan takes the lion’s share of the fieldwork I do. It is a mountainous region in the south of Russia stretching from the Caspian Sea to the Caucasus. It has borders with Azerbaijan and Georgia to the south, and within the Russian Federation it is next door to Chechnya. Medieval geographers described the Caucasus as “a mountain of tongues”, and with good reason. There are over forty languages spoken in this relatively small territory (just 50,300 sq km), and most of the linguistic diversity lies within an even smaller mountainous region in the south of Daghestan, involving languages of the indigenous Nakh-Daghestanian family.

I wrote before about the linguistic interest of the language I have worked with the most, Archi (in many respects a typical representative of the family), but today I want to talk about social and cultural aspects of the work.

Culturally, Daghestan is a relatively homogeneous region; traditionally people lived in small villages, bred sheep and grew sturdy grains like rye and barley. Before the 20^th century, many villages were organised as follows: there was one central village where people got together during summer months while the sheep were in the alpine pastures and did not need shelter during the night, and in winter months the people would go to smaller hamlets where the sheep (split into smaller groups) were kept in the houses or in underground sheepfolds made in the caves. The name for these winter sheepfolds is the same across several Daghestanian languages, so we can safely assume this was common practice for a long time.

After the Revolution of 1917 and the creation of the Soviet Union, many people got the opportunity to drive the sheep to regions with a milder climate near the Caspian Sea, and these shepherding practices ceased to exist. The smaller hamlets either disappeared or grew into proper villages, and in the latter case developed some dialectal differences. The people like to notice those differences but at the same time they still often perceive the conglomerate of the central village and the “hamlets” (which in some cases are even larger than the central village) as a single village.

Besides sheep breeding, Daghestanian people grew grain, and traditionally they would roast grains and make flour out of them. That flour can be mixed with water and then eaten directly, and in some villages they still make this “shepherd’s food” (they call it “old time instant noodles”). There were also many traditional crafts, among which are the silver products of Kubachi, wood inlaid with silver from Untsukul, Lezgian knitted slippers and earthenware from Balkhar.

From a sociolinguistic point of view, the Daghestanian languages were in a much better state during the 20C than many other smaller languages of Russia. Although only a handful of Daghestanian languages were recognised by the state and therefore taught at school, children in smaller language communities remained monolingual until well into their teens. Most Daghestanian people belonging to smaller language groups also speak the language of a larger Daghestanian neighbour (such as Avar, Dargwa or Lezgian) and one national language, whether Russian, Azeri or Georgian, although in the last 50 years Russian has been steadily coming to replace the others. The first thing that strikes a linguist who comes to Daghestan (especially if that linguist has experience of working with small languages in other parts of the Russian Federation) is how proud the people are of their languages, how ready they are to share them, how much delight they take in their complexity. Indeed, since they all speak at least one other language, they can well see that their languages are more complex, at least phonetically (for example, Archi has 70 consonants).

Some places in Daghestan have kept their traditional ways better than others: thus, in 2004, when I first came to Archi, I was really fascinated to see many women wearing traditional clothes and jewellery not only on special occasions but every day.

Living in people’s houses, I could see that they used traditional cots for babies and had retained most of the old practices connected with childbirth. For example, right before having her first baby, the woman goes to her mother’s house and stays there for the first 40 days of the baby’s life, being completely looked after (very often she just stays in bed). After 40 days, she moves back to her and her husband’s house in a very colourful procession: the whole thing is called “moving of the cot”.

But maybe the most important cultural characteristic of Daghestan is the living cultural practice of protecting one’s guests. Stemming from old times when travelling in the Caucasus mountains was not always safe, if one happens to come to a Daghestanian village one will be invited into a house, given food and shelter and will become kunaks with the master of the house. Kunak is not easy to translate. It means ‘guest’, but also ‘friend’. So I can say “I have a kunak in that village” meaning there is a person there who once was my guest (or vice versa) and now we are friends, so I can always count on having food and shelter in his house as much as he can in mine. In former times it was a duty for the master of the house to protect his kunak such that if anything were to happen to him, the perpetrator of the bad deed would answer to the house where the guest was staying. This system is still very much alive in Daghestan, and once I had eaten or slept in somebody’s house, I knew that I would be safe in that village and probably the neighbouring ones too.

Word games

March 9, 2022 Sacha Beniamine Comments 0 Comment

You have very certainly heard about Wordle, the viral word game by powerlanguage, recently bought by the NYT. In the original game, a 5-letter English word is secretly chosen every day, which players attempt to guess in 6 tries. Each guess is answered by colored cues: green for “correct letter in the correct place”, orange for “correct letter in the wrong place”, gray for “incorrect letter”. The concept of wordle is not new, and resembles games such as Jotto, Lingo, and mastermind.

While some may have been annoyed by the endless stream of three-color square emojis reporting players’ success and inundating social media I have been delighted by the productivity displayed by the many variants: in hello wordl, play an endless number of games; in dordle, quordle, octodle guess several words at once; in squardle, play in two dimensions; in nerdle, guess a mathematical formula; in absurdle, the games does its best to get away from your guesses, etc.

Some derived games transform the game mechanics, but the simplest variation is to switch the vocabulary (have you tried queerdle or lordle of the rings?) or the language. Indeed, wikipedia already references more than 40 wordle language variants. If I believe my social feeds, many linguists have found that they were able to play in languages that they didn’t speak, provided that they had some intuitions of the phonotactics and orthographic sequences. I was however quite disappointed to see that many versions retained the English-centric 1-letter:1-unicode-character, and avoided diacritics altogether, leading to strange impoverished typography — this is the case for example of the French wordle, “le mot”.

The French wordle accepts "meler", but not "melez" — The French wordle accepts “meler”, but not “melez”

While playing variants, I realized that a wordle is only as good as its word list: some games rely on lexicons which contain only citation forms (infinitives for French verbs) and exclude the many others inflected forms, leading to a frustrating game experience. For example, in Le Mot, one can play mêler (or more exactly, meler) “to mix”, but not meles “(you) mix”. It happens that well curated words lists including inflected variant is a Surrey Morphology Group specialty: lexicons and dictionaries are a common product of language documentation, and as its names indicates, researchers at the SMG have a particular focus on morphology. We have been maintaining open inflectional databases since the 90s. After discussion, we agreed collectively to start by producing two wordle-like games, corresponding to the two main lexicons in the SMG databases, respectively the Dictionary of Archi and the Nuer Lexicon.

Nuerdle interface — SMG wordle in Nuer: Nuerdle

The Nuer language, or Thok Nath, is a West Nilotic language spoken by approximately 900,000 to two million people in South Sudan and Ethiopia, as well as in diaspora communities throughout the world. The SMG has created an interactive online dictionary for it. From this lexicon, I have extracted 6218 words, mostly verbs and nouns, with a few other part of speech represented. All targets are taken from this set of words. However, using only the lexicon would risk rejecting a lot of words the speakers might know, even though they are not documented in the lexicon. Thus, I also extracted all of the words from the Nuer translation of the Bible¹. This led to a total lexicon of 13476 words².

Archidle interface — SMG wordle in Archi: Archidle

Archi is a Daghestanian language of the Lezgic group spoken by about 1200 people in Daghestan. At the SMG, we created a dictionary of Archi, with entries in Russian, English, and Nuer (both orthographic and phonetic forms), from which I extracted 3626 words for our wordle puzzle. For now, we do not have any more words for Archi, but we are working on it. In the game, we have ignored the stress diacritics, which might not be intuitive enough for speakers.

Two Nuer Keyboards. On the left, from a mobile app. On the right, our keyboard. — Nuer keyboards: from a mobile app (left), or from our wordle game (right).

In order to create the SMG wordles, I started from the open source code of the re-playable version, hello wordle. In order to keep the game closer to its original, I removed the re-playable function. However, I did keep the option to play a range of word length from 4 to 7 letters. Each day, you can thus play 4 games in each language. A main challenge was that the Nuer orthography comprises diacritics, which required rewriting large parts of the game, as it previously assumed that each letter could be written with a single character. Another difficulty came from the fact that neither language has a unique, widely used, keyboard layout. For Nuer, we created one based on a mobile keyboard, which we extended to include more diacritics.

Two Cyrillic Keyboards. On the left, standard Russian layout. On the right, our keyboard for Archi. — Cyrillic keyboards: Russian keyboard from a mobile app (left), or Archi keyboard from our wordle game (right).

In both cases, we strove to make the game playable by learners, linguists, and curious people who do not speak Archi or Nuer. For this reason, we made the default word length 4 letters rather than 5, to make the game easier. Moreover, we added short English definitions for all words in our lexicons, with links to their full definitions in our resources. Words in Nuer from the bible are not always present in our Nuer lexicon, and hence, some words in Nuer can appear without translations. Finally, in order to help beginners get started, we provide a few example words of the correct length each day, hidden by default, which can be used to start playing.

Ri̱et: "word" in Nuer — A word played in Nuerdle, with translation in the margin

Besides learning the languages, scouring the dictionary, or using the words given as hints daily, how can you get better at the Nuer or Archi wordle ? It helps to pay attention to the frequency of each letters, and try to play words with frequent letters, in order to reduce the pool of potential words quickly. For the English wordle, some have calculated the optimal starting word. Rather than risk spoiling the game, I provide below the relative frequencies of each of the 5 most frequent letters, for each position (1 to 7) in Nuerdle and Archidle words. This should give an idea of frequent letters at each position. The colors are assigned according to overall frequency in the lexicon, with light greens more frequent than dark blues. Each bar represents the frequencies of the five most frequent letters in a word position (from 1 to 7), ignoring the other, less frequent letters. Each stacked colored bar’s height, between two white lines, represents the letter’s frequency: eg. in Nuer, a word in our lexicon starts with k around 10% of the time, and with t around 12% of the time. If there is some interest, a future blog post could explore further the frequent sequences and letter patterns in either languages.

Frequency of each character in Nuer words in our lexicon, per positon

Frequency of each character in Archi words in our lexicon, per positon

Finally, since this is a morphology blog, I would like to draw your attention to the interesting way in which English acquired a new -dle suffix. The original game is called wordle, a combination of the creator’s last name Wardle, and of word. As the game became viral, the apparent suffix has come to mean “game in the wordle family” (or maybe “online guessing game”). Interestingly, even though the most obvious decomposition of wordle seems to be word+le, the productive suffix is -dle, not -le. Could this be because the family resemblance in the new words is more obvious by keeping more common material ? Isn’t analogy mysterious? In any cases, after hesitating with ri̱etle (from ri̱et “word”+le, in Nuer) and č’atle (from č’at, “word” in Archi), we settled instead on calling our games Archidle and Nuerdle.

excluding words starting with a capital, in order to avoid proper names. [↩]
If you want to suggest missing Nuer words, the Nuer lexicon has a module for suggestions ! [↩]

Are words all different? Or are they all the same?

November 25, 2020 Greville Corbett Comments 0 Comment

Imagine we have less than a life-time to describe the words of a given language. We might start from the view that each individual word is a treasure to be described in exquisite detail. Indeed, it is one of the achievements of our field that linguists have found and described gems like the following:

Archi (Dagestan) has the word t’uq’ˤ. which is a stone post inside an underground sheepfold, which supports the stone roof.

In Archi the *t’uq’ˤ* is the stone posts supporting the roof of a sheepfold (Photo credit: Dr. Marina Chumakina).

Soq (Papua New Guinea) has the verb s- ‘stay’, which is anti-irregular. While typical irregular verbs (like English go ∼ went) have unexpected forms but mean ‘the right thing’ (went means ‘go in the past’), the Soq verb s- ‘stay’ is the opposite of that. Its forms are unremarkable, but uniquely in the language, its present tense covers the time period of the English present (‘now’). All other verbs have a present tense (sometimes called ‘hodiernal’) which covers the period starting at nightfall yesterday and running through to and including ‘now’.
Krongo (Sudan) has the noun m-ùsí ‘sorcerer’, where the initial m- tells us it is singular. The plural is nú-kù-kk-ùs-óoní ‘sorcerers’ with no less than four plural markers, each of which is found independently with other nouns.
Russian skovorodá ‘frying pan’ seems remarkable only in that you have to wait for the last syllable to put the stress on the word. But in the plural, the stress moves forward three syllables: skóvorody ‘frying pans’, which makes it sound rather different.
English dust. Yes, even English has some star items. The humble verb dust is an example of ‘Gegensinn’, that is, it means its own opposite. We can dust a cake with icing sugar (that is, putting on particles), the opposite of dusting the furniture (removing particles).
Dusting – even elephants love to do it!

But there is a danger with this approach: we may well manage a few hundred items, and leave behind an unpublished dictionary. Or we may publish Volumes I-III (A-F), leaving the user stuck for words later in the alphabet: this happens particularly with larger projects, when grand intentions meet organizational and financial reality.

The alternative approach is to start from the assumption that all words in a language are the same. We soon discover, of course, that this is not quite true. There are dramatic generalizations to be made: we may find, for instance, that many words can occur alone, and some cannot. More generally different classes of words have different properties of combination with others. That is, we specify part of speech information (verb, noun, and so on). Consistent with this, wholly or partly, we may find regularities such as some words distinguishing tense while others do not. And real dictionaries embody such regularities as defaults. If an English dictionary specifies that compute is a verb, it is taken as given that it will have a past tense, that the form will be computed, that this past tense will be compositional (we know that what it means is a combination of the lexical meaning of compute and the grammatical meaning of PAST). And when a default is overridden, the information is given in the dictionary entry. For example, the past tense of go is went (and only the form need be given, since our default assumption about what it means will hold good), or that binoculars is a noun but lacks a singular.

I have described not one, but two straw men, though I have met real people who came close to these extremes. The point is that the interest of the linguistic gems we started with comes precisely from the way in which they stand out against the backdrop of the general picture. We know that there are general defaults – otherwise speakers and hearers would not cope. We expect singular and plural of a noun to be linked by a simple formula, rather than by a stress-shift that dramatically changes the way the noun sounds, as with Russian skovoroda ‘frying pan’. So in principle we can start from either end (words are all different or words are all the same), so long as we have the other horizon in view too.

Don’t forget to destress when using a frying pan in Russian… But if you can’t take the heat, time to get out of the kitchen!

Of course, real people tend to feel more comfortable working from one end or the other; lexicographers are, arguably, more interested in the differences and linguists more in the generalizations. And there are important movements within the field where dictionary-makers point out the need for much more detailed grammatical information about individual words, and conversely where linguists point out that the broad classes we often work with need to be broken down into rather finer detail.

A saving grace in all this is the possibilities offered by online dictionaries. We can present some of the richness of words in new ways. For example, rather than trying to describe what the pillar that holds up the roof of an underground sheep fold looks like, we can give a picture. The online Archi dictionary does this. And it provides the sound file, so that users can hear what the word sounds like. Indeed they can hear all the basic forms needed to derive its large array of forms (its extensive paradigm). What if the system of sounds comprising the words has taken years of work to unravel? We want to hear the sounds and see the system. This is something – among other good things – that the new Nuer dictionary offers.

Browsing in the Archi and Nuer dictionaries makes us marvel at how different those words are, one from another, and perhaps from ‘our’ words. And yet they are all the same too – they all use the same Archi and Nuer systems of sounds, and they fall into parts of speech which are interestingly comparable to ‘our’ parts of speech (verbs and nouns are distinct, and so on).

It would need several lifetimes for anything approaching a ‘complete’ dictionary of Archi or Nuer. But there are plenty of surprises whichever perspective we take: the dictionary entries tell us about the amazing differences between languages, but the innocent little markers (like v. and n.), and the sets of forms given, point to the equally amazing sameness.

If you enjoyed this post, why not check out our favourite untranslatable words from the languages we work on.

What do we lose when we lose a language?

April 1, 2019 Marina Chumakina Comments 6 comments

By the end of this century we are likely to lose half of the world’s six thousand languages. With each lost language a whole world of thought, customs, traditions, poems, songs, jokes, myths, legends and history gets lost. Knowledge of local plants, herbs, mushrooms and berries, their medicinal and culinary uses disappears, together with names for small rivers, mountains, valleys and forests. And this is only a tiny fragment of what we lose when we lose a language.

For a linguist, a loss of a language is first and foremost a loss of system with a unique set of properties and rules which make it work. If there are any universal principles behind the architecture of human language, our only hope to figure them out is by studying the multitude of languages still existing on the planet. And endangered languages – those that we were lucky enough to have time and resources to study – show us time and again how vast is the range of linguistic variability. For example, it has been thought and stated by linguists and psychologists that grammatical tense can be marked by verbs only, as hundreds and hundreds of languages behave this way. Then we discovered that Kayardild, a morbidly endangered language of Australia, marks tense on nouns as well as verbs, making us reconsider this ‘universal’.

Archi, a language spoken in one village the highlands of Daghestan (Caucasus, Russia), is an endangered language which I have been working on since 2004. There are only about 1300 speakers of this language and, as far as we know, there never have been more than that. Yet for centuries it was spoken in the Archi village (below) and passed to younger generations without being under any threat.

Being so small, there was never a writing system invented for Archi – people in the village did not need to write to each other, and all communication with the outsiders happened in one of the larger languages of the area. Until the 1940s this was Lak, then Avar (two large languages of Daghestan), and in the past 40 years, these have been increasingly replaced by Russian. Archi people lived a hard but self-sufficient life keeping sheep in the mountains for themselves and for trading (the alpine pastures within walking distance of Archi village make their lamb hard to compete with) and growing grains, mostly rye, on terraces: narrow strips of land dug into the steep mountain slopes. These grains were just for their own consumption, as it was too hard a job to grow any more than they needed to survive.

We cannot even say that the arrival of television, mobile phones and the internet – which happened more or less at the same time in Archi – is responsible for language decline. It is just that life in the mountains is very hard, so the Archi people start moving to the cities, abandoning their traditional way of life and their language. Since I started working with Archi, two of the village’s primary schools have been closed and others are struggling as young people continue to leave. Kids abandon Archi as soon as they go to school or nursery in town, and their parents tend to follow suit. Older people in the village still wear traditional dress and keep up traditional skills, but the younger generation is moving away from these traditions. And when the last school closes in the village and no more children live there, the language’s fate will be sealed.

What will we lose once Archi is lost? We will lose a verbal system which boasts the largest number of verb forms registered – Archi verb has up to 1.5 million forms. With this, we will forever lose the opportunity to figure out how the human brain can operate such a humongous system; we won’t be able to watch children learning such a complex language, going through stages of acquisition, making telling mistakes and the overgeneralisations (like English kids do when they go through the stage of producing forms like goed, readed, telled, eated etc). We will have the knowledge that a system such as the Archi verb existed, but we will never know how it functioned.

We will lose a system of deictic pronouns (like English ‘this’ and ‘that’) which had five words in it. These mark not just the proximity to the speaker (like English this), but also the perspective of the listener, and the vertical position in regard to the speaker (see below). Even if these are not unique as lexical items, the whole linguistic system in which they operate is unique. We don’t know yet how these pronouns work in stories as opposed to conversation, and at the moment we have no good techniques to find this out.

jat	‘this, close to the speaker’
jamut	‘this, close to the hearer’
tot	‘that, far away from the speaker’
godot	‘that, far away and lower than the speaker’
ʁodot (the first sound is a bit like the French pronunciation of r)	‘that, far away and higher than the speaker’

We will lose a system where subject and object in the sentence work differently from what we are used to in European languages. In most European languages, the subjects of transitive and intransitive verbs have the same form (as in He arrived and He brought her along), while the object gets a different marking (She arrived vs. He brought her along). In Archi, the subject of an intransitive verb such as ‘arrive’ is marked the same as the object of a transitive verb such as ‘bring’:

Tuw q^wʕa ‘he arrived’

Tormi tuw χir uwli ‘She brought him’.

This is called Ergative-Absolutive alignment, and was first brought to the attention of linguists by the Australian language Dyirbal, which is now already dead. Several other linguistic families of the world use the same way of making sentences, including Archi. As not many Dyirbal materials have been recorded, it is Archi and other endangered Daghestanian languages that have been making linguists reconsider universals about subject, object and verb relations.

This is only a glimpse of the impact that endangered languages have on linguistics as a discipline. In the last few decades, linguists have become much more aware of how invaluable endangered languages are and how fragile their futures, and more and more efforts are now directed to documenting and – whenever possible – preserving the linguistic diversity of the world.

MORPH

A blog about languages and how they change

Browsed by
Category: Archi