Browsed by
Category: typology

Yesterday, Today and Tomorrow

Yesterday, Today and Tomorrow

How do we talk about time? This may seem a simple question with a simple answer; we are all human, surely we all experience time the same way? That may be true, but that doesn’t mean that all languages organise the time in the same way. This is arguably most apparent when it comes to talking about the days either side of the present day. We all live on earth and so therefore all experience a day-night cycle; all can understand how one day follows after another. However, the words we use to locate events in this cycle can vary wildly in their construction.

Let’s take a look at two languages, Scottish Gaelic and Sylheti, and see how their systems compare with that of English. All three of these languages belong to the same family, Indo-European, so it might be assumed that they show many similarities. And yet each still exhibits significant variation in how they talk about time.

Firstly, Scottish Gaelic. Like English, it distinguishes between ‘yesterday’, ‘today’ and ‘tomorrow’. The terms each show a consistent structure with a frozen prefix a(n)- with three morphologically opaque roots; an-dè, an-diugh and a-màireach respectively. Furthermore none of the Gaelic terms has any connection with the normal word for ‘day’, latha/là. Compare English, where yester-day and to-day both feature the word ‘day’, while to-day and to-morrow both feature a frozen prefix to- (historically a demonstrative). Additionally, there are also single terms for ‘last night’ as well as ‘tonight’ with a-raoir and a-nochd respectively, again with no immediately apparent connection with the normal term for ‘night’ oidhche. On the other hand, there is no single term for ‘tomorrow night’ so the compound expression oidhche a-màireach is used instead. There are also additional terms for ‘the day after tomorrow’ and ‘the day before yesterday’, an-earar and a bhòn-dè respectively, while the latter has a counterpart in a bhòn-raoir for ‘the night before last’. English is also reported to have had similar terms in the form of ereyesterday and overmorrow, though these have fallen out of usage in the modern day.

Gaelic is also in another respect slightly more regular than English in how it refers to parts of the day. While in English we have a split between ‘this morning’ and ‘yesterday morning’, Gaelic instead uses madainn an-diugh and madainn an-dè, where the former literally translates to ‘today morning’.

But all this is not really that surprising. All that really distinguishes Scottish Gaelic from English in this respect is which time categories are given single indivisible terms rather than compositional expressions; the fundamental organisation of the system is still broadly similar to English. To see a far more radically different system of organising time words, we will now turn to Sylheti, an Indo-Aryan language spoken in north-eastern Bangladesh by around 9-10 million and by perhaps a further 1 million in diaspora, including by most of the British Bangladeshi community.

Here, instead of distinguishing between ‘yesterday’ and ‘tomorrow’, we instead find a single term xail(ku), contrasting with aiz(ku) meaning ‘today’ (the -ku is a suffix which can optionally appear on a lot of ‘time’ words, such as onku ‘now’ or bianku ‘(this) morning’). The two senses of ‘tomorrow’ and ‘yesterday’ can be distinguished by combining them with goto ‘past’ and agami ‘future’, but just as commonly instead the distinction is solely marked by whether the verb is in the past or future tense, e.g. xailku ami amar bondu dexsi ‘I saw my friend yesterday’ vs. xailku ami amar bondu dexmu ‘I will see my friend tomorrow’.

This is not an isolated instance in the language, either, but in fact represents a consistent trend. So in the same manner foru can be either ‘the day before yesterday’ or ‘the day after tomorrow’ depending on context and toʃu the same but at one day further removed.

Table of day and night terms in english, Gaelic and Sylheti respectively
Visualising the systems

Nor is Sylheti unique in using this kind of system; it is also found in many parts of New Guinea, for example. Yimas, a language of northern New Guinea, also uses the same term ŋarŋ for both ‘yesterday’ and ‘tomorrow’, urakrŋ for ‘two days removed’ and so on, all the way up to manmaɲcŋ for ‘five days removed’. Once again whether the reference is to the past or future is carried by the choice of tense on the verb, though Yimas has a far more complex system than that seen in Sylheti, for instance distinguishing a near past -na(n) from a more remote past -ntuk~ntut.

Sylheti also has more fine grained distinctions for parts of the day than either English or Scottish Gaelic. For example, if one wishes to say ‘in the morning’ one must decide whether one is talking about the early morning (ʃoxal) or the mid to late morning (bian). Additionally, while forms such as ‘yesterday/tomorrow afternoon’, ‘the night before last/after next’ and ‘yesteray/tomorrow morning’ use compound expressions (xail madan, foru rait and xail bian/ʃoxal respextively), to express ‘this morning/this afternoon/tonight’ the word for the part of the day (perhaps with the oblique suffix -e or a time suffix -ku) is sufficient by itself, for example amra ʃoxale Sylheʈ aisi ‘We arrived in Sylhet this morning’ or ami raitku dua xotram ‘I am praying tonight’ (with rait ‘night’).

This is just one small part of the temporal vocabulary, and only looking at representatives from a single family, and yet already we see great variation in how time is organised and discussed. It is not so much that these groups have fundamentally different conceptions of time, as these languages share a common ancestor and are only separated by a few thousand years. Instead, it is a testament to the fluidity of time itself, resulting in the words used to refer to it easily shifting in meaning and being reorganised over generations.

Sign language mythbusters

Sign language mythbusters

We have all heard of sign languages. Most of us have seen people talking to each other using their hands and body movements instead of the voice: on the street, at a train station, or in a noisy café. We probably even felt a slight jolt of envy, thinking about how much easier it must be for them to communicate, when they are surrounded by loud music, laughter, and chatter. Curiously, however, very few people know what sign languages actually are. Unless you are a sign language user and/or a linguist, you probably have a lot of misconceptions about their nature. For this reason, linguists who write about sign languages, often begin their books with a discussion of myths and misconceptions. For example, Robin Battinson wrote a section on misconceptions about ASL, Trevor Johnston and Adam Schembri covered the same topic on the data of Australian Sign Language, Vadim Kimmelman and Svetlana Burkova discussed common mistakes in light of Russian Sign Language. Let us follow their example and bust a few myths!

Myth №1 There is only one sign language

Perhaps, the most mind-blowing thing about sign languages is that there is more than one. Indeed, if we never encountered sign languages in action, we most probably have a default assumption that there is one sign language, and everyone is using it. Why would you need more? Surely, at some point, someone came up with a list of signs for different objects and actions, and now all deaf and hard-of-hearing people use them.

“That Deaf Guy” comic by Matt & Kay Daigle

This is not true. Nowadays, we know about not one, not even ten, but one hundred and seventy different sign languages spread around the world. And it is very possible that there are other sign languages we are not even yet aware of. Check out the map from Glottolog, that provides a catalogue of the world’s languages:

Sign languages of the world

Each dot in this map represents a language with its own vocabulary and grammatical structure. The yellow dots are sign languages that developed in urban settings. The blue dots are so-called ‘rural’ sign languages that appeared in small village communities with a high rate of hereditary deafness. Finally, the rare red dots are ‘secondary sign languages’. These languages developed in hearing societies as a substitute for spoken languages in certain situations.

Yes, 170 sign languages is a much more modest amount than roughly 6500 spoken languages, but it is definitely more than one. Now, let’s reflect on what sign languages actually are.

Myth №2 Sign languages are a kind of pantomime

Who likes Charades? In this classic team game, you need to enact a title of a book or a movie without saying a single word. Some of these titles can be quite tricky. Have you ever tried to mime “Star Wars Episode V: The Empire Strikes Back”? So, we put forward our best improvisation techniques and we create quite complicated sequences of body movements in order to express the idea we need.

Sign languages do the same thing, don’t they? They express different ideas with movements of the hands and other parts of the body. So, maybe sign languages and pantomime are in fact the same thing? Well, no, not really. You see, one very important feature of a pantomime is transparency. We are usually able to guess what is going on without anyone translating it for us. Sign languages are not so generous. Try to make sense of this short video in Russian Sign Language. I can even give you a hint: the title of this video is ‘Miracles of dog training’.

A short story ‘Miracles of dog training’ in Russian Sign Language

If you are not familiar with Russian Sign Language, you probably didn’t understand that an unlucky man, the main character of this tale, tried to teach his dog to bring him a stick. The dog didn’t quite grasp the concept and instead started bringing him umbrellas, which it would steal from unsuspecting passers-by.

Why is it so hard to understand a sign language? Let me answer this with a counterquestion: why we would expect it to be easy? Well, this assumption stems from the phenomenon called ‘iconicity’. A lot of signs in sign languages look like what they describe. For example, if you watch the video about the dog training again, you will easily find a sign for ‘holding a stick in a mouth’. A tricky thing about iconicity, however, is that it is evident once you know what the sign means. But can you guess a meaning of an iconic sign? Let’s give it a go! Here is a sign in Russian Sign Language. Can you guess what it means?

An iconic sign in Russian Sign Language

If you are done guessing, here is the answer. This sign means ‘empty’. Once we know this, it seems obvious that a person in this video imitates looking for something in an empty bag. But it is really hard to guess it beforehand.

Another reason for the non-transparency of sign languages is that, unlike pantomime improvised on the spot, sign languages have quite complex rules for forming sentences. Speaking of sentences, let’s bust another widespread myth that has to do with sign language structure.

Myth №3 Sign languages are spoken languages articulated with hands

Many people assume that sign languages are not independent languages, but instead are signed versions of spoken languages. For example, British, American and Australian Sign Languages are signed versions of English, French Sign Language is a version of French, Russian Sign Language is a version of Russian, and so on. From this point of view, if someone wanted to express a sentence in English with something other than their voice, they could write it down or sign in instead.

However, this is not the case. Many aspects of sign languages are completely unrelated to spoken languages that surround them. Trevor Johnston and Adam Schembri provide a good illustration of this using Australian Sign Language as an example. The English word light has several meanings, such as ‘not heavy’ (as in a light bag), ‘pale’ (as in a light colour), or ‘energy from the sun or lamp that allows us to see things’ (as in turn on the light). Although in English all these meanings are expressed with the same word, they would be translated to Australian Sign Language with three different signs.

Australian Sign Language translations for the English word “light”

Of course, this is not the only kind of difference between sign and spoken languages. Grammars are different too. Sign languages do not have articles, such as a and the in English, or case marking, like Russian Genitive or Dative. They don’t mark plurality and past tense with special endings. Instead, they have their own ways to express time and quantity related information. Many of them revolve around iconicity. But this is a topic for a different post. Stay tuned!

A picture is worth a thousand words: Choosing images for psycholinguistic research

A picture is worth a thousand words: Choosing images for psycholinguistic research

Linguists need to come up with different ways of testing our theories of how particular languages in the world function. We generally rely on two main methods of data collection – linguistic elicitation and corpus collection. With linguistic elicitation a linguist asks a speaker of a language: ‘How do you say “Monty Python is really funny” in your language?’ But can we be sure that what the speaker said is naturalistic and not just a word for word translation?

Linguists need naturalistic data and can also record stories and conversations to build up a representative sample of a language (a corpus). This however takes a lot of time, effort and dedication on the part of both the linguist and the community of speakers of a language. It might even be that – after years of toil – the particular construction that a linguist wants to look at is under-represented with a dearth of examples in the corpus.

Thankfully, there is a happy medium! We can combine cognitive psychological techniques and targeted linguistic elicitation, to create scenarios where speakers produce naturalistic responses. Of course, this technique brings with it another set of problems entirely.

Psycholinguistic experiments need to be carefully designed and can’t be made up on the fly in response to something a speaker of a language says to you; this is drastically different to standard linguistic elicitation where one can continually come up with new sentences to check, while in the middle of working with a speaker of a language.

In our current research on optimal categorisation we aim to find out how different nouns are assigned to different classifiers in a group of six related Oceanic languages spoken in Vanuatu and New Caledonia. Each language has a different inventory size of classifying particles — from two to 23 — which are used in possessive constructions, and categorise the possession in terms of its use or functionality.

Here are a few examples from the Iaai language, spoken in New Caledonia, which has the largest inventory of classifiers in our sample of languages:

(1a)	a-n			wââ	(b)	hanii-ny		wââ
        FOOD.CLASSIFIER-his	fish 		CATCH.CLASSIFIER-his	fish
        ‘his fish (to eat)		        ‘his fish (which he caught)’
(2a)	a-n			koko	(b)	noo-n			koko
	FOOD.CLASSIFIER-his	yam		PLANT.CLASSIFIER-his	yam
	‘his yam (to eat)’			‘his yam plant’

We want to see whether or not a particular noun that refers to a particular entity can occur with different classifiers, like with the words for ‘fish’ and ‘yam’ in Iaai above. Also, how does a language with 23 classifiers function differently from a language with just two or three classifiers?

One way in which we can discover how the classifiers function in each language is to use a card sorting experiment. These experiments present speakers with entities in the form of pictures. Speakers are asked to sort them into different groups, first in a “free sort” where they can create groups on any basis they feel is relevant and important, and second, in a “structured sort” where they are asked to group entities according to which classifier they would use in a possessive construction. By doing this with lots of participants we can see individual speaker variation in language usage in one language and across languages and get a clear sense of if and how a language’s classifier system is influencing the way that speakers think about and process different entities.

Once we have decided on which nouns to test in a card sort experiment we have to find or make pictures that represent these images. Sadly I don’t have the artistic skills of Michelangelo and won’t be painting any masterpieces for the experiment! 

Choosing what type of image is trickier than it sounds as we are presented with an array of options.

First should we use simple line drawings of the images? The Noun Project has over 2 million small black and white line drawings. With such a choice of images we can find what we need. Here are some images of yams that I found on the site that we could use for our experiment.

These are great, and I know they are yams because I searched for images of yams on the website. But if I present these images to speakers I want them to tell me what they are. If the images aren’t instantly recognisable then participants will use different nouns to describe what they are seeing – is it a yam? A sweet potato? Manioc? Or some other entity? Actually, to tell you the truth, the third picture is actually a sweet potato! But it looks very similar to the first picture of a yam. Another problem is that these images can be quite abstract – and we can’t be sure that these symbolic representations of entities will be shared across different cultural and linguistic groups.

What about black and white pictures? – These are cheaper to print and easier to standardise. But we do not see the world in black and white and presenting entities as black and white pictures  may make it harder to identify  them, especially when the lightness of the background and the object of focus are similar. We need to be sure that the images we choose are easy to identify or else we can end up with problems of misidentification.

Another possibility is to remove the background of the image.  By doing this we can eliminate distractions and help the participant focus on the object in the image. However, the background is often key. Background information gives context that can influence how the speaker of a language perceives the entity in the image.

For instance, speakers may classify a fish that has been caught differently to a fish that is alive and swimming in the sea. The edible classifier is more likely with the former scenario, and a general classifier with the latter. But if we were to remove the background from both of these photos they would look strikingly similar! This leads us onto a very important question – what classifier would speakers of these languages use for a parrot if it was alive or dead?

So now we have decided to present images in colour and keep the background. But we must make sure that the background varies across different images. We don’t want participants to sort the entities into groups based on a colour or shape in the background or some other extraneous visual cue that may appear in several pictures!

For every psycholinguistic experiment that uses images there are multiple decisions that need to be made to figure out what type of image is required. The images we have chosen are specifically tailored to the nature of the languages we are studying to ensure that they are culturally relevant and thus identifiable.

For us, the pictures need to be realistic and represent the world around us — Sadly, we can’t take artistic licence with kangaroos and trampoline acts, as fun as that would be!

 

What do we lose when we lose a language?

What do we lose when we lose a language?

By the end of this century we are likely to lose half of the world’s six thousand languages. With each lost language a whole world of thought, customs, traditions, poems, songs, jokes, myths, legends and history gets lost. Knowledge of local plants, herbs, mushrooms and berries, their medicinal and culinary uses disappears, together with names for small rivers, mountains, valleys and forests. And this is only a tiny fragment of what we lose when we lose a language.

For a linguist, a loss of a language is first and foremost a loss of system with a unique set of properties and rules which make it work. If there are any universal principles behind the architecture of human language, our only hope to figure them out is by studying the multitude of languages still existing on the planet. And endangered languages – those that we were lucky enough to have time and resources to study – show us time and again how vast is the range of linguistic variability. For example, it has been thought and stated by linguists and psychologists that grammatical tense can be marked by verbs only, as hundreds and hundreds of languages behave this way. Then we discovered that Kayardild, a morbidly endangered language of Australia, marks tense on nouns as well as verbs, making us reconsider this ‘universal’.

Archi, a language spoken in one village the highlands of Daghestan (Caucasus, Russia), is an endangered language which I have been working on since 2004. There are only about 1300 speakers of this language and, as far as we know, there never have been more than that. Yet for centuries it was spoken in the Archi village (below) and passed to younger generations without being under any threat.

Being so small, there was never a writing system invented for Archi – people in the village did not need to write to each other, and all communication with the outsiders happened in one of the larger languages of the area. Until the 1940s this was Lak, then Avar (two large languages of Daghestan), and in the past 40 years, these have been increasingly replaced by Russian. Archi people lived a hard but self-sufficient life keeping sheep in the mountains for themselves and for trading (the alpine pastures within walking distance of Archi village make their lamb hard to compete with) and growing grains, mostly rye, on terraces: narrow strips of land dug into the steep mountain slopes. These grains were just for their own consumption, as it was too hard a job to grow any more than they needed to survive.

We cannot even say that the arrival of television, mobile phones and the internet – which happened more or less at the same time in Archi – is responsible for language decline. It is just that  life in the mountains is very hard, so the Archi people start moving to the cities, abandoning their traditional way of life and their language. Since I started working with Archi, two of the village’s primary schools have been closed and others are struggling as young people continue to leave. Kids abandon Archi as soon as they go to school or nursery in town, and their parents tend to follow suit. Older people in the village still wear traditional dress and keep up traditional skills, but the younger generation is moving away from these traditions. And when the last school closes in the village and no more children live there, the language’s fate will be sealed.

What will we lose once Archi is lost? We will lose a verbal system which boasts the largest number of verb forms registered – Archi verb has up to 1.5 million forms. With this, we will forever lose the opportunity to figure out how the human brain can operate such a humongous system; we won’t be able to watch children learning such a complex language, going through stages of acquisition, making telling mistakes and the overgeneralisations (like English kids do when they go through the stage of producing forms like goed, readed, telled, eated etc). We will have the knowledge that a system such as the Archi verb existed, but we will never know how it functioned.

We will lose a system of deictic pronouns (like English ‘this’ and ‘that’) which had five words in it. These mark not just the proximity to the speaker (like English this), but also the perspective of the listener, and the vertical position in regard to the speaker (see below). Even if these are not unique as lexical items, the whole linguistic system in which they operate is unique. We don’t know yet how these pronouns work in stories as opposed to conversation, and at the moment we have no good techniques to find this out.

jat this, close to the speaker
jamut ‘this, close to the hearer’
tot ‘that, far away from the speaker’
godot ‘that, far away and lower than the speaker’
ʁodot  (the first sound is a bit like the French pronunciation of r) ‘that, far away and higher than the speaker’

 

We will lose a system where subject and object in the sentence work differently from what we are used to in European languages. In most European languages, the subjects of transitive and intransitive verbs have the same form (as in He arrived and He brought her along), while the object gets a different marking  (She arrived vs. He brought her along). In Archi, the subject of an intransitive verb such as ‘arrive’ is marked the same as the object of a transitive verb such as ‘bring’:

Tuw qa ‘he arrived’

Tormi tuw χir uwli ‘She brought him’.

This is called Ergative-Absolutive alignment, and was first brought to the attention of  linguists by the Australian language Dyirbal, which is now already dead. Several other linguistic families of the world use the same way of making sentences, including Archi. As not many Dyirbal materials have been recorded, it is Archi and other endangered Daghestanian languages that have been making linguists reconsider universals about subject, object and verb relations.

This is only a glimpse of the impact that endangered languages have on linguistics as a discipline. In the last few decades, linguists have become much more aware of how invaluable endangered languages are and how fragile their futures, and more and more efforts are now directed to documenting and – whenever possible – preserving the linguistic diversity of the world.

Morphological Redundancy – Why say something twice when once will do?

Morphological Redundancy – Why say something twice when once will do?

In Batsbi (a language spoken in the Caucusus in North-East Georgia), if you want to say ‘she is ripping the dress’ you might say something like yoxyoyanw k’ab. In this word, each instance of ‘y’ (highlighted in bold) indicates that it is indeed just one dress that she is ripping.

Linguists call this phenomenon multiple exponence, where a single meaning is indicated within a word more than once, for no apparent reason. This, when you think about it, is pretty weird. Typically we think of languages as incremental in nature: intuitively, we assume that when we add something to a word or a sentence we are adding meaning to that word or sentence. But in multiple exponence this clearly can’t be the case. The dress in the Batsbi example is no more singular than any other singular object in the world, so why have three ‘y’s’ rather than just the one we would expect?

In other words, why say something twice when once will do? The short answer is we don’t know (yet!) – sorry to disappoint! But what I can answer is a slightly different question: what does it actually mean to say something twice?

Multiple exponence is not the only way you might say something twice within a word. There is another phenomenon known as overlapping exponence, where the same meaning is indicated by multiple markers in a word (as with multiple exponence), but each marker is also doing some other job. For example, in Filomeno Mata Totonco (a language from Mexico) you say ‘you are coming’ using the word tanpaati. This word has two suffixes, paa and ti, both of which mean ‘you’ (second person). However, the paa also indicates that the event is progressive (like the English –ing), while the other suffix ti indicates that the subject is singular rather than plural. So speakers of this language mention that it’s you who is coming twice, but we couldn’t remove either of the suffixes from the word without affecting the meaning, as both of them also tell us something else about what’s going on.

In Wipi, a language spoken in the Fly River Delta on the south coast of Papua New Guinea, if you want to say that you are building two houses you would use the word arangen which literally means ‘I build two’. This word is rather interesting since you need both the prefix, a, and the suffix, en, to know that this is indeed only two houses as opposed some other number of houses. Yet neither of these affixes actually means ‘two.’ Instead, the suffix en is ambiguous between one or two; we might say it means less than three. The prefix a, in contrast, is used when you are building two or more houses; in other words, it means more than one. Thus, if you are building more than one house but also less than three, there is only one interpretation: you are building two houses. This is called distributed exponence. It’s remarkable that speakers of Wipi say how many houses they are building twice, but in order to know the exact number of houses, you need to listen both times!

The Fly River Delta

It’s amazing really, when you look closely at a simple question like what does it mean to say something twice?, that there is such complexity and diversity in the answer. Beyond what we saw, there are all sorts of in-between cases and the multiple types can interact. As such, teasing them apart can be a real challenge. When I say something twice, it might be that each time gives you more information in subtly different ways. It is untying this kind of subtle diversity which hopefully gives us some hint as to why speakers and languages would ever do such a thing to begin with.

Optimal Categorisation: How do we categorise the world around us?

Optimal Categorisation: How do we categorise the world around us?

People love to categorise! We do this on a daily basis, consciously and subconsciously. When we are confronted with something new we try and figure out what it is by comparing it to something we already know. Say, for instance, I saw something flying through the air – I may think to myself that the object is a bird, or I may say it is a plane based on my previous experiences of birds and planes. Of course the object may turn out to be something completely new, perhaps even superman!

Is it a bird? Is it a plane? No it’s Superman!

Our love of classification runs deep in scientific enquiry. Botanists and zoologists classify plants and animals into different taxonomies. Even the humble linguist loves to classify – is this new word a noun or a verb? What about the new word zoodle that was recently added to the Merrriam Webster dctionary? Is it a thing? Or an action? Can I zoodle something or is it something I can pick up and touch? Well apparently zoodle is a noun which means ‘a long, thin strip of zucchini that resembles a string or narrow ribbon of pasta’. To be honest, I love eating zoodles, though until now I never knew what they were called!

The way people classify entities around them has become encoded in the different languages we speak in many different ways. The most obvious example that springs to mind is when we learn a new language, like French or German, we are confronted with a grammatical gender system. French has two genders – Masculine and Feminine. But German has three – Masculine, Feminine and Neuter. Other languages can have many more gender distinctions. Fula, a language spoken in west and central Africa, has twenty different gender categories!

So what exactly are grammatical gender systems and how are they realised in different languages? Gender systems categorise nouns into different groups and tend to appear not on the noun itself, but on other elements in the phrase. In German, nouns are split into three different gender categories – masculine, feminine and neuter. The gender of a noun is shown by using different articles (the word ‘the’ or ‘a’) and sometimes by changing the ending of an adjective, but never on the noun itself. Thus the word for ‘the’ in German is either der, die or das depending on whether the noun in the phrase is masculine, feminine or neuter.

(1)        der       Mann
              the       man

(2)        die        Frau
              the       woman

(3)        das       Haus
              the       house

This is called ‘agreement’ as the adjectives and articles must agree with the gender of the noun. In a language with gender, each noun typically can only occur in one gender category.

Not every language has a grammatical gender system, but they are highly pervasive, with around 40% of all languages having such a system. English is quite a poor example when it comes to gender. There is no real gender agreement in English, with the exception of pronouns. We have to say: Bill walked into the grocers. He bought some apples. Where the pronoun he must agree with the gender of the noun that was previously mentioned. English uses he, she and it as the only markers of gender agreement.

Languages behave differently in how they allocate nouns to the different genders, which can be very baffling for language learners! Why in French is chair feminine, la chaise, but in German it is masculine, der Stuhl? How a language allocates nouns to its gender categories can seem somewhat arbitrary – with the exception of the words for women and men, which fall into the feminine and masculine genders being the only semantically obvious choices.

But wait! If you thought the English gender system was dull, think again! A couple of months ago my piano was being restored and when it was being moved back into the lounge the piano movers kept saying: “pull her a little bit more” and “turn her this way”. The movers used the female pronouns to describe the piano. In English, countries, pianos, ships and sometimes even cars use the feminine pronouns.

Grammatical gender isn’t the only way languages classify nouns. Some languages use words called classifiers to categorise nouns. Classifiers are similar to English measure terms, which categorise the noun in terms of its quantity, such as ‘sheet of paper’ vs. ‘pack of paper’ or ‘slice of bread vs. ‘loaf of bread’. Classifiers are found in languages all over the world and are able to categorise nouns depending on the shape, size, quantity or use of the referent, e.g. ‘animal kangaroo’ (alive) vs. ‘meat kangaroo’ (not alive). Classifier systems are very different to gender systems as nouns in a language with classifiers can appear with different classifiers depending on what property of the noun you wish to highlight. There are many different types of classifier systems, but to keep things short I am just going talk about possessive classifiers, which are mainly found in the Oceanic languages, spoken in the South Pacific.

When an item is in your possession we use possessive pronouns in English to say who the item belongs to. For instance if I say ‘my coconut’ – the possessive pronoun is my. In many Oceanic languages a noun can occur with different forms for the word my depending on how the owner intends to use it. For instance the Paamese language, spoken in Vanuatu, has four possessive classifiers and I could use the ‘drinkable’ if I was talking about my coconut that I was going to drink. I would use the ‘edible’ classifier if I was going to eat my coconut. I would use the classifier for ‘land’ if I was talking about the coconut growing in my garden. Finally, I could use the ‘manipulative’ classifier if I was going to use my coconut for some other purpose – perhaps to sit on!

(4)        ani                   mak
              coconut           my.drinkable
              ‘my coconut (that I will drink)’

(5)        ani                   ak
              coconut           my.edible
              ‘my coconut (that I will eat)’

Why do languages have different ways of categorising nouns? How do these systems develop and change over time? Are gender systems easier to learn than classifier systems? Are gender and classifiers completely different systems? Or is there more similarity to them than meets the eye? These are some of the big questions in linguistics and psychology. We are excited to start a new research project at the Surrey Morphology Group, called optimal categorisation: the origin and nature of gender from a psycholinguistic perspective, that seeks to answer these fundamental questions. Over the next three years we will talk more about these fascinating categorisation systems, explain our experimental research methods, introduce the languages and speakers under investigation, and share our findings via this blog. Just look out for the ‘Optimal Categorisation’ headings!

The death of the dual, or how to count sheep in Slovenian

The death of the dual, or how to count sheep in Slovenian

‘How cool is that?’ in German, literally ‘how horny is that then?’

One reason why translation is so difficult – and why computer translations are sometimes unreliable – is that languages are more than just different lists of names for the same universal inventory of concepts. There is rarely a perfect one-to-one equivalence between expressions in different languages: the French word mouton corresponds sometimes to English sheep, and at other times to the animal’s meat, where English uses a separate word lamb or mutton.

This was one of the great insights of Ferdinand de Saussure, arguably the father of modern linguistics. It applies not only in the domain of lexical semantics (word meaning), but also to the categories which languages organise their grammars around. In English, we systematically use a different form of nouns and verbs depending on whether we are referring to a single entity or multiple entities. The way we express this distinction varies: sometimes we make the plural by adding a suffix to the singular (as with hands, oxen), sometimes we change the vowel (foot/feet) and occasionally we don’t mark the distinction on a noun at all, as with sheep (despite the best efforts of this change.org petition to change the singular to ‘shoop’). Still, we can often tell whether someone is talking about one or more sheep by the form of the agreeing verb: compare ‘the sheep are chasing a ball’ to ‘the sheep is chasing a ball’.

Some languages make more fine-grained number distinctions. The English word sheep could be translated as ovca, ovci or ovce in Slovenian, depending on whether you’re talking about one, two, or three or more animals, respectively. Linguists call this extra category between singular and plural the dual. The difference between dual and plural doesn’t show up just in nouns, but also in adjectives and verbs which agree with nouns. So to translate the sentence ‘the beautiful sheep are chasing a ball’, you need to ascertain whether there are two or more sheep, not just to translate sheep, but also beautiful and chase.

Lepi ovci lovita žogo
beautiful sheep chase ball
Lepe ovce lovijo žogo
beautiful sheep chase ball

According to some, having a dual number makes Slovenian especially suited for lovers (could this explain the Slovenian tourist board’s decision to title their latest campaign I feel sLOVEnia?). But putting such speculations aside, it’s hard to see what the point of a dual could be. We rarely need to specify whether we are talking about two or more than two entities, and on the rare occasions we do need to make this information explicit, we can easily do so by using the numeral two.

This might be part of the reason why many languages, including English, have lost the dual number. Both English and Slovenian ultimately inherited their dual from Proto-Indo-European, the ancestor of many of the languages of Europe and India. Proto-Indo-European made a distinction between dual and plural number in its nouns, adjectives, pronouns, and verbs, but most of the modern languages descended from it have abandoned this three-way system in favour of a simpler opposition between singular and plural. Today, the dual survives only in two Indo-European languages, Slovenian and Sorbian, both from the Slavic subfamily.

In English the loss of the dual was a slow process, taking place over thousands of years. By the time the predecessor of English had split off from the other Germanic languages, the plural had replaced the dual everywhere except the first and second-person pronouns we and you, and verbs which agreed with them. By the earliest written English texts, it had lost the dual forms of verbs altogether, but still retained distinct pronouns for ‘we two’ and ‘you two’. By the 15th century, these were replaced by the plural forms, bringing the dual’s final demise.

Grammatical categories do not always disappear without a trace – in some languages the dual has left clues of its earlier existence, even though no functional distinction between dual and plural remains. Like English, German lost its dual, but in some Southern German dialects the dual pronoun enk (cognate with Old English inc, ‘to you two’) has survi­ved instead of the old plural form. In modern dialects of Arabic, plural forms of nouns have generally replaced duals, except in a few words mostly referring to things that usually exist in pairs, like idēn ‘hands’, where the old dual form has survived as the new plural instead. Other languages show vestiges of the dual only in certain syntactic environments. For example, Scottish Gaelic has preserved old dual forms of certain nouns only after the numeral ‘two’: compare aon chas ‘one foot’, dà chois ‘two feet’, trì casan ‘three feet’, casan ‘feet’.

Although duals seem to be on the way out in Indo-European languages, it isn’t hard to find healthy examples in other language families (despite what the Slovenian tourist board might say). Some languages have even more complicated number systems: Larike, one of the languages spoken in Indonesia, has a trial in addition to a dual, which is used for talking about exactly three items. And Lihir, one of the many languages of Papua New Guinea, has a paucal number in addition to both dual and trial, which refers to more than three but not many items. This system of five number categories (singular/dual/trial/paucal/plural) is one of the largest so far discovered. Meanwhile, on the other end of the spectrum are languages which don’t make any number distinction in nouns, like English sheep.