Browsed by
Category: Languages

Are words all different? Or are they all the same?

Are words all different? Or are they all the same?

Imagine we have less than a life-time to describe the words of a given language. We might start from the view that each individual word is a treasure to be described in exquisite detail. Indeed, it is one of the achievements of our field that linguists have found and described gems like the following:

  • Archi (Dagestan) has the word t’uq’ˤ. which is a stone post inside an underground sheepfold, which supports the stone roof.
In Archi the t’uq’ˤ is the stone posts supporting the roof of a sheepfold (Photo credit: Dr. Marina Chumakina).
  • Soq (Papua New Guinea) has the verb s- ‘stay’, which is anti-irregular. While typical irregular verbs (like English go ∼ went) have unexpected forms but mean ‘the right thing’ (went means ‘go in the past’), the Soq verb s- ‘stay’ is the opposite of that. Its forms are unremarkable, but uniquely in the language, its present tense covers the time period of the English present (‘now’). All other verbs have a present tense (sometimes called ‘hodiernal’) which covers the period starting at nightfall yesterday and running through to and including ‘now’.
  • Krongo (Sudan) has the noun m-ùsí ‘sorcerer’, where the initial m- tells us it is singular. The plural is nú-kù-kk-ùs-óoní ‘sorcerers’ with no less than four plural markers, each of which is found independently with other nouns.
  • Russian skovorodá ‘frying pan’ seems remarkable only in that you have to wait for the last syllable to put the stress on the word. But in the plural, the stress moves forward three syllables: skóvorody ‘frying pans’, which makes it sound rather different.
  • English dust. Yes, even English has some star items. The humble verb dust is an example of ‘Gegensinn’, that is, it means its own opposite. We can dust a cake with icing sugar (that is, putting on particles), the opposite of dusting the furniture (removing particles).
    Dusting – even elephants love to do it!

    But there is a danger with this approach: we may well manage a few hundred items, and leave behind an unpublished dictionary. Or we may publish Volumes I-III (A-F), leaving the user stuck for words later in the alphabet: this happens particularly with larger projects, when grand intentions meet organizational and financial reality.

    The alternative approach is to start from the assumption that all words in a language are the same. We soon discover, of course, that this is not quite true. There are dramatic generalizations to be made: we may find, for instance, that many words can occur alone, and some cannot. More generally different classes of words have different properties of combination with others. That is, we specify part of speech information (verb, noun, and so on). Consistent with this, wholly or partly, we may find regularities such as some words distinguishing tense while others do not. And real dictionaries embody such regularities as defaults. If an English dictionary specifies that compute is a verb, it is taken as given that it will have a past tense, that the form will be computed, that this past tense will be compositional (we know that what it means is a combination of the lexical meaning of compute and the grammatical meaning of PAST). And when a default is overridden, the information is given in the dictionary entry. For example, the past tense of go is went (and only the form need be given, since our default assumption about what it means will hold good), or that binoculars is a noun but lacks a singular.

    I have described not one, but two straw men, though I have met real people who came close to these extremes. The point is that the interest of the linguistic gems we started with comes precisely from the way in which they stand out against the backdrop of the general picture. We know that there are general defaults – otherwise speakers and hearers would not cope. We expect singular and plural of a noun to be linked by a simple formula, rather than by a stress-shift that dramatically changes the way the noun sounds, as with Russian skovoroda ‘frying pan’. So in principle we can start from either end (words are all different or words are all the same), so long as we have the other horizon in view too.

    Don’t forget to destress when using a frying pan in Russian… But if you can’t take the heat, time to get out of the kitchen!

    Of course, real people tend to feel more comfortable working from one end or the other; lexicographers are, arguably, more interested in the differences and linguists more in the generalizations. And there are important movements within the field where dictionary-makers point out the need for much more detailed grammatical information about individual words, and conversely where linguists point out that the broad classes we often work with need to be broken down into rather finer detail.

    A saving grace in all this is the possibilities offered by online dictionaries. We can present some of the richness of words in new ways. For example, rather than trying to describe what the pillar that holds up the roof of an underground sheep fold looks like, we can give a picture. The online Archi dictionary does this. And it provides the sound file, so that users can hear what the word sounds like. Indeed they can hear all the basic forms needed to derive its large array of forms (its extensive paradigm). What if the system of sounds comprising the words has taken years of work to unravel? We want to hear the sounds and see the system. This is something – among other good things – that the new Nuer dictionary offers.

    Browsing in the Archi and Nuer dictionaries makes us marvel at how different those words are, one from another, and perhaps from ‘our’ words. And yet they are all the same too – they all use the same Archi and Nuer systems of sounds, and they fall into parts of speech which are interestingly comparable to ‘our’ parts of speech (verbs and nouns are distinct, and so on).

    It would need several lifetimes for anything approaching a ‘complete’ dictionary of Archi or Nuer. But there are plenty of surprises whichever perspective we take: the dictionary entries tell us about the amazing differences between languages, but the innocent little markers (like v. and n.), and the sets of forms given, point to the equally amazing sameness.

    If you enjoyed this post, why not check out our favourite untranslatable words from the languages we work on.

A “let’s circle back” guy

A “let’s circle back” guy

As everyone knows by now, for the foreseeable future we must all stay at home as much as possible to slow the spread of COVID-19 and reduce the burden on our health services – which has already been substantial, and will soon be enormous even in the best possible scenario.

This shift in the way we operate as a society will have a wide range of effects on our lives, which are already being noticed. Some of these were the kind of thing you might have thought of in advance – but others less so. For example, soon after the advice to work from home really started to bite in the US, a substantial thread developed on Twitter, all started off by the following tweet:

https://twitter.com/inLaurasWords/status/1240687424377720835

The thousands of responses that appeared within a few hours of this tweet shows how deeply it resonated: many people must have been through their own version of the same surprising experience, some of them presumably in the last few days. But what happened here, and why was it so surprising? And why, as a linguist, am I sitting at home and writing a blog post about it now?

This single tweet, which people found so easy to identify with, in fact brings together a number of issues that linguists are interested in. For one thing, it works as a clear illustration of a point that people intuitively appreciate, but which has endless ramifications: the language you use is never just an instrument for communicating your thoughts, but is also taken to say something important about your identity, whether you intend it to or not. If a guy uses the expression “let’s circle back”, meaning to return to an issue later, that makes him a “let’s circle back” guy – that is, a particular kind of person. In a jokey way, the tweeter is implying that she already had a mental category of ‘the kind of person who would say things like that’, and she takes it for granted that we do too. In this case, the surprise for Laura Norkin was in suddenly discovering that her own husband belonged in that pre-existing category: the way she tells it, hearing him use a specific turn of phrase counted as finding out important new information about who he is as a person, which she was not necessarily best pleased about.

Making a linguistic choice: a bilingual road sign in Wales

Since the mid-twentieth century, the field of sociolinguistics has drawn attention to the fact that this kind of thing is going on everywhere in language. Consciously or unconsciously, people are making linguistic choices all the time – whether that means choosing between two totally different languages, between two different expressions with the same meaning (do you circle back to something or just return to it?), or between two very slightly different pronunciations of the same word. Any of these choices might turn out to ‘say something’ about how you see yourself – or how other people see you. And the social meanings and values assigned to the different choices are likely to change over time: so understanding what is going on with one person’s use of language really requires you to understand what is going on right across the community, which is like an ecosystem full of co-existing language diversity. How do linguistic developments, and the social responses to them, propagate and interact in this ecosystem? That’s something that researchers work hard to find out.

The tweet also picks up on the importance of the situational context for the way people use language. Laura Norkin had never heard her husband use the offending expression before because it belongs to a particular register – meaning a variety of language which is characteristic of a particular sphere of activity. Circling back is characteristic of ‘full work mode’, something which had never previously needed to surface in the domestic setting.

Why do registers exist? Partly it must be to do with the fact that different people know different things: for example, lawyers can expect to be able to use technical legal terminology with their colleagues, but not with their clients, even if they are talking about all the same issues – because behind the terminology there lies a wealth of specialist knowledge. Similarly, anyone would modify their language when talking to a five-year-old as opposed to a fifty-year-old.

But this cannot be the whole story: it doesn’t help you to explain the difference between returning and circling back. Should we think of the business/marketing/management world, where terms like circling back are stereotypically used, as a mini community within the community, with its own ideas of what counts as normal linguistic practice? Or is everyone involved giving a signal that they take on a new, businesslike identity when they turn up to the office – even if these days that doesn’t involve leaving the house? Again, working out the relationship between the language aspect and the social aspect here makes an interesting challenge for linguistics.

The medical profession is well known for having its own technical register

But this was not just an anecdote about how unusual it is to be at home and yet hear terms that usually turn up at work. We can tell that “let’s circle back”, just like other commonly mocked corporate expressions such as “blue-sky thinking” or “push the envelope”, is something we are expected to dislike – but why? The existence of different registers is not generally thought of as a bad thing in itself. You could give the answer that this expression is overused, a cliché, and thus sounds ugly. But really, things must be the other way round: English abounds in commonly used expressions, and only the ones that ‘sound ugly’ get labelled as overused clichés. And there is nothing inherently worse about circle back than about re-turn – in fact, when you think about it, they are just minor variations on the same metaphor.

So what is really going on here? The popular reaction to circle back, and other things of that kind, seems to involve lots of factors at once. The expression is new enough that people still notice it; but it is not unusual enough to sound novel or imaginative. It is currently restricted to a particular kind of professional setting that most people never find themselves in; but it does not refer to a complex or specific enough concept to ‘deserve’ to exist as a technical term. And we do not tend to worry too much about making fun of the linguistic habits of people who have a relatively privileged position in society: certainly, teasing your husband by outing him as a “let’s circle back” guy is not really going to do him any harm.

Spelling it out like this helps to suggest just how much information we are factoring in whenever we react to the linguistic behaviour of the people around us – and this is something we do all the time, mostly without even noticing. We are social beings, and cannot help looking for the social message in the things people say, as well as the literal message: establishing this fact, and working out how to investigate it scientifically, has been one of the great overarching projects of modern linguistics. Right now, for everyone’s benefit, we need to learn how to be less sociable than ever. But as the tweet above suggests, people’s inbuilt sensitivity to language as a social code is not going to change any time soon.

Arabic based scripts

Arabic based scripts

Scripts spread like bad news. Look at the Latin script, which is the ultimate winner considering the hundreds, if not thousands of languages that use it today. Political power and religion have caused the Latin script to serve as the basis for this proliferation of written languages, first in Europe, and then almost everywhere else, including many languages that had no written tradition before the Western influence. The exceptions are the scripts that have a strong enough tradition that keeps them going.

However, the Latin script is not the only prevalent one. Wikipedia lists 95 languages that are using, or have actively used the Arabic script. In this post we will be looking at how they do it.

The way different languages use a script can vary significantly. Some can invent new versions of letters that express the peculiar sounds of a language, such as the long vowels in Hungarian: á, í, é, ó, ú, ő, ű. Others, like English, combine existing letters to do the same job, like th or ch. Some will get rid of the letters that are not useful enough. Next time you visit Turkey, look at the taxi signs.

A Turkish taksi

One way we could classify writing systems is how helpful they are, if someone intends to read them. Chinese is famously not very helpful. Even though some characters will give a hint on how to pronounce the word, or what it means, generally you have to learn thousands of characters, that refer to separate “words”. English is rather helpful in the sense that the letters generally help the reader figure out what sound is supposed to be pronounced. Not always, thouGH. Sometimes it is touGH to determine how to pronounce GH, for example. Is it /f/, /g/ or /nothing/? Learners have to learn the differences individually. The most helpful scripts represent a speech sound with a single letter consistently. Look at Turkish! Nobody needs an X if you have KS, that perfectly does the job at all times.

Arabic is similar to English in this classification, but in a completely different way. In order to understand what is going on, we must know what templatic morphology is. When creating new words, most languages add meaningful bits to the beginning, or to the end of a word. Or both, like in the case of my favorite Metallica song, the Un-forgive-n. We can say that English, in most cases, uses a word as a base for such operations. Arabic, on the other hand, uses two or three consonants, as a base. They are not words; they rather represent a broad concept. The schoolbook example is K-T-B, which represents the broad concept of writing. Arabic, then, adds things before, after and in between (i.e. applies the three consonants to a template). The templates also have meanings and thus narrow down the concept’s meaning to a word, that can actually be used in the language. There are only two rules when inserting the three consonants into a template: 1) Do not skip any consonant, and 2) keep their order. Let’s see a few examples, how these templates work. The capital letters are the base consonants, and the small letters fill in the template.

Template meaning K-T-B ‘write’ M-L-K ‘rule, possess’
place where happens maKTaBa ‘library’ maMLaKa ‘kingdom’
person who does it KāTiB ‘writer’ MāLiK ‘king’
passive (being done) maKTūB ‘written’ maMLūK ‘slave’

Long story short, templates are extremely important in Arabic. This is combined with the unfortunate fact that Arabic has lots of consonants and very few vowels, namely, /a/, /u/, and /i/. They all contrast long and short versions, that gives a total of six vowels. On the contrary, there are 28 consonants. Here is a really nice introduction to Arabic speech sounds.

The facts above have led to a writing system where vowels are so ‘underrated’ that they are basically not marked. In fact, the long vowels are marked, but by specific consonants, that may be pronounced as a consonant, or considered as a sign that marks a long vowel. To illustrate this, let’s see some Arabic words, the raw information you get from the letters you see, some possible pronunciations, just for fun, and how you actually need to pronounce them.

مورد
raw information [m] [w/ū] [r] [d]
possible pronunciation mawarad, mūrad, mawrad, miward, muwarrid, muwarad…
actual pronunciation mawrid
meaning supplies

مدينة
raw information [m] [d] [y/ī] [n] [a]
possible pronunciation midayna, mudayna, madayna, mudīna, midīna, madīna…
actual pronunciation madīna
meaning city

Arabic has a way of signaling how a word should be pronounced exactly, but these additional signs above and below the main letters (diacritics) are only used in children’s reading books and in the Qur’ān. Nothing above and below the red lines actually appear in every-day texts or in handwriting.

Arabic script

In essence, instead of marking vowels with high precision, Arabic marks the consonants and in most cases, you can figure out the template as well. And if you know Arabic, then you know all the templates, so you don’t even really need those unmarked vowels.

The Arabic writing system fits the Arabic language really neatly, but what about other languages? Persian uses the Arabic script, but it has no templates. It is an Indo-European language with word formation rules that are very similar to the ones we find in European languages. So, how did they deal with this situation? Well, they did their best to mark vowels with a bit more precision. At the ends of words, Persian uses the letter /h/ to mark the vowels /e/ and /a/. The consonants that can signal the presence of a long consonant in Arabic, are used much more consistently, so when you see one, you can be almost sure that there is a long vowel. Apart from the vowel problem, Persian has also added a couple of consonants, that Arabic lacks, such as /p/, /g/ or /ch/.

Urdu is spoken mainly in Pakistan, and it is quite similar to Hindi, but let’s stick to the fact (there is a political debate), that it has retroflex consonants (the tip of the tongue curls backwards). Those are the speech sounds in many Indic languages that make them sound so recognizable. Urdu’s strategy is similar to what we saw in Persian, with the addition of the retroflex consonant. There is also an additional, second form of the letter h, that signals aspiration (the h-like sound after consonants, like in the words dharma, makhani or bhaji). The last addition is a differently shaped letter y, that marks /ay/ or /ey/, as opposed to a long /ī/. In Persian and Arabic, there is only one letter that represents these three sounds.

Urdu is also special in that the Urdu printed texts use a type of calligraphy, called Nasta’liq. This makes Urdu texts look very different from Arabic, but it is only a matter of fonts.

Arabic newspaper
Urdu newspaper

Lastly, let’s discuss a language that has completely reformed the Arabic script. Uyghur is a Turkic language spoken in the Xinjiang Uyghur Autonomous Region in Northwest China. As all Turkic languages, Uyghur has a large number of vowels and relatively few consonants. This makes the Arabic script a rather difficult choice for this language, unless some modifications are done. In the Uyghur script, every speech sound is represented in a consistent way, i.e. there is no ambiguity whatsoever. The set of consonants is essentially the same as in Persian, but there are nine additional letters that allow for a precise marking of vowels. For anybody else from the world of Arabic based scripts, the resulting text may appear somewhat weird. The following image illustrates how different this script is from the previous ones. The parts circled are the Uyghur innovations that would be incorrect in Arabic, Persian or in Urdu. Notice their proportion.

Uyghur script

The cherry on the cake is the Thaana script. It is used to write Dhivehi, an Indo-European language spoken in the Maldives. This script is based on Arabic, but in a unique way. Thaana started off as a secret script for sacred, religious texts. It was considered a way of encryption, and therefore the letters originate from Arabic letters, as well as Arabic numbers and Indic numbers (!). Imagine that you code a message that looks like this: 7q۳۶gt55۹۴. All speech sounds are precisely marked, as in Uyghur. Notice the vowel-marking diacritics above and below the main letters, and their similarity to the Arabic diacritics (in the picture above where the diacritics are separated with a red line). But of course, this script looks really different from the other ones we have seen.

Dhivehi newspaper

Linguists believe that only a handful of writing systems appeared independently around the world. Most languages had to adopt the script of another language, and due to different needs and strategies, we have ended up with a myriad of historically related, but still, different scripts. Linguists consider writing systems negligible, since they are just the representation of language, which we are truly interested in. I think, however, that the backgrounds of different scripts are amazing.

Cushty Kazakh

Cushty Kazakh

With thousands of miles between the East End of London and the land of Kazakhs, cushty was the last word one expected to hear one warm spring afternoon in the streets of Astana (the capital of Kazakhstan, since renamed Nur-Sultan). The word cushty (meaning ‘great, very good, pleasing’) is usually associated with the Cockney dialect of the English language which originated in the East End of London.

Del Boy from Only Fools and Horses
Del Boy from Only Fools and Horses

Check out Del Boy’s Cockney sayings (Cushty from 4:04 to 4:41).

Cockney is still spoken in London now, and the word is often used to refer to anyone from London, although a true Cockney would disagree with that, and would proudly declare her East End origins. More specifically, a true ‘Bow-bell’ Cockney comes from the area within hearing distance of the church bells of St. Mary-le-Bow, Cheapside, London.

Due to its strong association with modern-day London, the word ‘Cockney’ might be perceived as being one with a fairly short history. This could not be further from the truth as its etymology goes back to a late Middle English 14th century word cokenay, which literally means a “cock’s egg” – a useless, small, and defective egg laid by a rooster (which does not actually produce eggs). This pejorative term was later used to denote a spoiled or pampered child, a milksop, and eventually came to mean a town resident who was seen as affected or puny.

The pronunciation of the Cockney dialect is thought to have been influenced by Essex and other dialects from the east of England, while the vocabulary contains many borrowings from Yiddish and Romany (cushty being one of those borrowings – we’ll get back to that in a bit!). One of the most prominent features of Cockney pronunciation is the glottalisation of the sound [t], which means that [t] is pronounced as a glottal stop: [ʔ]. Another interesting feature of Cockney pronunciation is called th-fronting, which means that the sounds usually induced by the letter combination th ([θ] as in ‘thanks’ and [ð] as in ‘there’ are replaced by the sounds [f] and [v]. These (and some other) phonological features characteristic of the Cockney dialect have now spread far and wide across London and other areas, partly thanks to the popularity of television shows like “Only Fools and Horses” and “EastEnders”.

As far as grammar is concerned, the Cockney dialect is distinguished by the use of me instead of my to indicate possession; heavy use of ain’t in place of am not, is not, are not, has not, have not; and the use of double negation which is ungrammatical in Standard British English: I ain’t saying nuffink to mean I am not saying anything.

Having borrowed words, Cockney also gave back generously, with derivatives from Cockney rhyming slang becoming a staple of the English vernacular. The rhyming slang tradition is believed to have started in the early to mid-19th century as a way for criminals and wheeler-dealers to code their speech beyond the understanding of police or ordinary folk. The code is constructed by way of rhyming a phrase with a common word, but only using the first word of that phrase to refer to the word. For example, the phrase apples and pears rhymes with the word stairs, so the first word of the phrase – apples – is then used to signify stairs: I’m going up the apples. Another popular and well-known example is dog and bone – telephone, so if a Cockney speaker asks to borrow your dog, do not rush to hand over your poodle!

https://youtu.be/MSbWz1PIJY8
Test your knowledge of Cockney rhyming slang!

Right, so did I encounter a Cockney walking down the field of wheat (street!) in Astana saying how cushty it was? Perhaps it was a Kazakh student who had recently returned from his studies in London and couldn’t quite switch back to Kazakh? No and no. It was a native speaker of Kazakh reacting in Kazakh to her interlocutor’s remark on the new book she’d purchased by saying күшті [kyʃ.tɨˈ] which sounds incredibly close to cushty [kʊˈʃ.ti]. The meanings of the words and contexts in which they can be used are remarkably similar too. The Kazakh күшті literally means ‘strong’, however, colloquially it is used to mean ‘wonderful, great, excellent’ – it really would not be out of place in any of Del Boy’s remarks in the YouTube video above! Surely, the two kushtis have to be related, right? Well…

Recall, that cushty is a borrowing from Romany (Indo-European) kushto/kushti, which, in turn, is known to have borrowed from Persian and Arabic. In the case of the Romany kushto/kushti, the borrowing could have been from the Persian khoši meaning ‘happiness’ or ‘pleasure’. It would have been very neat if this could be linked to the Kazakh күшті, however, there seems to be no connection there… Kazakh is a Turkic language and the etymology of күшті can be traced back to the Old Turkic root küč meaning ‘power’, which does not seem to have been borrowed from or connected with Persian. Certainly, had we been able to go back far enough, we might have found a common Indo-European-Turkic root in some Proto-Proto-Proto-Language. As things stand now, all we can do is admire what appears to be a wonderful coincidence, and enjoy the journeys on which a two-syllable word you’d overheard in the street might take you.

A picture is worth a thousand words: Choosing images for psycholinguistic research

A picture is worth a thousand words: Choosing images for psycholinguistic research

Linguists need to come up with different ways of testing our theories of how particular languages in the world function. We generally rely on two main methods of data collection – linguistic elicitation and corpus collection. With linguistic elicitation a linguist asks a speaker of a language: ‘How do you say “Monty Python is really funny” in your language?’ But can we be sure that what the speaker said is naturalistic and not just a word for word translation?

Linguists need naturalistic data and can also record stories and conversations to build up a representative sample of a language (a corpus). This however takes a lot of time, effort and dedication on the part of both the linguist and the community of speakers of a language. It might even be that – after years of toil – the particular construction that a linguist wants to look at is under-represented with a dearth of examples in the corpus.

Thankfully, there is a happy medium! We can combine cognitive psychological techniques and targeted linguistic elicitation, to create scenarios where speakers produce naturalistic responses. Of course, this technique brings with it another set of problems entirely.

Psycholinguistic experiments need to be carefully designed and can’t be made up on the fly in response to something a speaker of a language says to you; this is drastically different to standard linguistic elicitation where one can continually come up with new sentences to check, while in the middle of working with a speaker of a language.

In our current research on optimal categorisation we aim to find out how different nouns are assigned to different classifiers in a group of six related Oceanic languages spoken in Vanuatu and New Caledonia. Each language has a different inventory size of classifying particles — from two to 23 — which are used in possessive constructions, and categorise the possession in terms of its use or functionality.

Here are a few examples from the Iaai language, spoken in New Caledonia, which has the largest inventory of classifiers in our sample of languages:

(1a)	a-n			wââ	(b)	hanii-ny		wââ
        FOOD.CLASSIFIER-his	fish 		CATCH.CLASSIFIER-his	fish
        ‘his fish (to eat)		        ‘his fish (which he caught)’
(2a)	a-n			koko	(b)	noo-n			koko
	FOOD.CLASSIFIER-his	yam		PLANT.CLASSIFIER-his	yam
	‘his yam (to eat)’			‘his yam plant’

We want to see whether or not a particular noun that refers to a particular entity can occur with different classifiers, like with the words for ‘fish’ and ‘yam’ in Iaai above. Also, how does a language with 23 classifiers function differently from a language with just two or three classifiers?

One way in which we can discover how the classifiers function in each language is to use a card sorting experiment. These experiments present speakers with entities in the form of pictures. Speakers are asked to sort them into different groups, first in a “free sort” where they can create groups on any basis they feel is relevant and important, and second, in a “structured sort” where they are asked to group entities according to which classifier they would use in a possessive construction. By doing this with lots of participants we can see individual speaker variation in language usage in one language and across languages and get a clear sense of if and how a language’s classifier system is influencing the way that speakers think about and process different entities.

Once we have decided on which nouns to test in a card sort experiment we have to find or make pictures that represent these images. Sadly I don’t have the artistic skills of Michelangelo and won’t be painting any masterpieces for the experiment! 

Choosing what type of image is trickier than it sounds as we are presented with an array of options.

First should we use simple line drawings of the images? The Noun Project has over 2 million small black and white line drawings. With such a choice of images we can find what we need. Here are some images of yams that I found on the site that we could use for our experiment.

These are great, and I know they are yams because I searched for images of yams on the website. But if I present these images to speakers I want them to tell me what they are. If the images aren’t instantly recognisable then participants will use different nouns to describe what they are seeing – is it a yam? A sweet potato? Manioc? Or some other entity? Actually, to tell you the truth, the third picture is actually a sweet potato! But it looks very similar to the first picture of a yam. Another problem is that these images can be quite abstract – and we can’t be sure that these symbolic representations of entities will be shared across different cultural and linguistic groups.

What about black and white pictures? – These are cheaper to print and easier to standardise. But we do not see the world in black and white and presenting entities as black and white pictures  may make it harder to identify  them, especially when the lightness of the background and the object of focus are similar. We need to be sure that the images we choose are easy to identify or else we can end up with problems of misidentification.

Another possibility is to remove the background of the image.  By doing this we can eliminate distractions and help the participant focus on the object in the image. However, the background is often key. Background information gives context that can influence how the speaker of a language perceives the entity in the image.

For instance, speakers may classify a fish that has been caught differently to a fish that is alive and swimming in the sea. The edible classifier is more likely with the former scenario, and a general classifier with the latter. But if we were to remove the background from both of these photos they would look strikingly similar! This leads us onto a very important question – what classifier would speakers of these languages use for a parrot if it was alive or dead?

So now we have decided to present images in colour and keep the background. But we must make sure that the background varies across different images. We don’t want participants to sort the entities into groups based on a colour or shape in the background or some other extraneous visual cue that may appear in several pictures!

For every psycholinguistic experiment that uses images there are multiple decisions that need to be made to figure out what type of image is required. The images we have chosen are specifically tailored to the nature of the languages we are studying to ensure that they are culturally relevant and thus identifiable.

For us, the pictures need to be realistic and represent the world around us — Sadly, we can’t take artistic licence with kangaroos and trampoline acts, as fun as that would be!

 

Poolish

Poolish

Courtesy of thefreshloaf.com

Those who have out of desire have chosen to or out of dire necessity been forced to bake their own bread may have encountered the term poolish. It refers to a semi-liquid pre-ferment used in bread-making, a mixture of half water and half white flour mixed with a teeny bit of yeast and allowed to slowly ferment for several hours, up to a day, before mixing up the final dough.

The word itself is an exceedingly odd one, and has been the source of much head-scratching and inconclusive speculation among bread-bakers across the world: it looks like the English word Polish, but is spelled funny, and anyway seems to be borrowed from French, where the spelling would be funnier still. Most discussions of the technique include the obligatory etymological digression, usually fantastical, involving journeymen Polish bakers fanning out over Europe. Linguists too have gotten on the trail: David Gold’s Studies in Etymology and Etiology (2009) devotes a whole page to the question, but does not get too far.

In its current form it is technical jargon from French commercial baking, and has probably made its way to a broader public through Raymond Calvel’s influential Le gout du pain (‘The taste of bread’) from 1990. In his account:

This method of breadmaking was first developed in Poland during the 1840s, from whence its name. It was then used in Vienna by Viennese bakers, and it was during this same period that it became known in France. (2001 edition translated by Ronald Wirtz)

This explanation has been widely accepted, and appears in one form or another in any number of bread-baking books. But how could it even be true? The first problem is the word itself. Poolish is not the French word for Polish, and doesn’t much look a French word anyway. In earlier French texts it crops as pouliche, which looks more French and is indeed the word for a young mare, whose connection to bread dough is tenuous at best. But earlier French texts also have the spelling poolisch or polisch, which looks rather more German than French and suggests we follow the Viennese trail instead.

This thread of inquiry has its own potential hiccoughs. The German word for Polish is polnisch, with an [n], so would this not just be fudging things? Actually not: polisch, poolischpohlisch or pollisch turn up often enough in older texts as alternative words for ‘Polish’, particularly in southern varieties of German that include Austria. And it is exactly in these form that we find it being used to refer to this particular process, juxtaposed with Dampfl (or Dampfel or Dampel), the term in southern Germany and Austria for a rather stiffer pre-ferment which goes through a shorter rising period, as in these two examples from 1865, one from Leopold Wimmer’s self-published advertising advertising screed for St. Marxer brand (of Vienna) pressed yeast, where it turns up as Pohlisch:

the other from Ignaz Reich’s (of Pest, as in Budapest) account of ancient Hebrew baking practices, where it’s rendered as pollisch.

The term polisch (in all its variants) in this sense seems to have died a natural death in German, only to reemerge during the current craft-baking revival in the guise of poolish.

But if poolish was originally the (or a) German word for Polish, we run up against the sticky question of what it was actually referring to. Calvel repeats the story that this technique was invented by Polish bakers (which turns up in a 1972 article in The Atlantic Monthly, I think anyway, because it’s but coyly revealed by Google in snippet view), a supposition which lacks as much plausibility as it does historical attestation. Poland has traditionally been a land of sourdough rye bread. Is seems unlikely that a novel technique involving the use both of white wheat flour and commercial pressed yeast (a relatively new product) would have been devised there and introduced into the imperial capital that was Vienna. So what on earth could it have meant?

Here I make my own foray into speculation; you read it here first. Poland is not just a land of sourdough rye bread, it is a land of a soup made from rye sourdough: żur or żurek (itself derived from sur, one variant of the German word for ‘sour’), still widely consumed and also sold in ready form form for time-strapped gourmands. Since the Austro-Hungarian Empire included much of what had once been Poland, it isn’t too far-fetched to think that people in Vienna might have been familiar with this soup. And since the salient characteristic of poolish is that it is basically liquid, in opposition to more solid doughs, my guess is that the term poolish arose as a facetious allusion to żur: a soup-like fermenting dough mixture, like the thinned-out sourdough soup that Poles eat.

This theory has the minor drawback of lacking any positive evidence in its favor. So far the only 19th century reference to żur outside of its normal context that I have been able to find is as a cure for equine distemper, otherwise known as ‘strangles’. That leads us into the topic of pluralia tantum disease names…

What do we lose when we lose a language?

What do we lose when we lose a language?

By the end of this century we are likely to lose half of the world’s six thousand languages. With each lost language a whole world of thought, customs, traditions, poems, songs, jokes, myths, legends and history gets lost. Knowledge of local plants, herbs, mushrooms and berries, their medicinal and culinary uses disappears, together with names for small rivers, mountains, valleys and forests. And this is only a tiny fragment of what we lose when we lose a language.

For a linguist, a loss of a language is first and foremost a loss of system with a unique set of properties and rules which make it work. If there are any universal principles behind the architecture of human language, our only hope to figure them out is by studying the multitude of languages still existing on the planet. And endangered languages – those that we were lucky enough to have time and resources to study – show us time and again how vast is the range of linguistic variability. For example, it has been thought and stated by linguists and psychologists that grammatical tense can be marked by verbs only, as hundreds and hundreds of languages behave this way. Then we discovered that Kayardild, a morbidly endangered language of Australia, marks tense on nouns as well as verbs, making us reconsider this ‘universal’.

Archi, a language spoken in one village the highlands of Daghestan (Caucasus, Russia), is an endangered language which I have been working on since 2004. There are only about 1300 speakers of this language and, as far as we know, there never have been more than that. Yet for centuries it was spoken in the Archi village (below) and passed to younger generations without being under any threat.

Being so small, there was never a writing system invented for Archi – people in the village did not need to write to each other, and all communication with the outsiders happened in one of the larger languages of the area. Until the 1940s this was Lak, then Avar (two large languages of Daghestan), and in the past 40 years, these have been increasingly replaced by Russian. Archi people lived a hard but self-sufficient life keeping sheep in the mountains for themselves and for trading (the alpine pastures within walking distance of Archi village make their lamb hard to compete with) and growing grains, mostly rye, on terraces: narrow strips of land dug into the steep mountain slopes. These grains were just for their own consumption, as it was too hard a job to grow any more than they needed to survive.

We cannot even say that the arrival of television, mobile phones and the internet – which happened more or less at the same time in Archi – is responsible for language decline. It is just that  life in the mountains is very hard, so the Archi people start moving to the cities, abandoning their traditional way of life and their language. Since I started working with Archi, two of the village’s primary schools have been closed and others are struggling as young people continue to leave. Kids abandon Archi as soon as they go to school or nursery in town, and their parents tend to follow suit. Older people in the village still wear traditional dress and keep up traditional skills, but the younger generation is moving away from these traditions. And when the last school closes in the village and no more children live there, the language’s fate will be sealed.

What will we lose once Archi is lost? We will lose a verbal system which boasts the largest number of verb forms registered – Archi verb has up to 1.5 million forms. With this, we will forever lose the opportunity to figure out how the human brain can operate such a humongous system; we won’t be able to watch children learning such a complex language, going through stages of acquisition, making telling mistakes and the overgeneralisations (like English kids do when they go through the stage of producing forms like goed, readed, telled, eated etc). We will have the knowledge that a system such as the Archi verb existed, but we will never know how it functioned.

We will lose a system of deictic pronouns (like English ‘this’ and ‘that’) which had five words in it. These mark not just the proximity to the speaker (like English this), but also the perspective of the listener, and the vertical position in regard to the speaker (see below). Even if these are not unique as lexical items, the whole linguistic system in which they operate is unique. We don’t know yet how these pronouns work in stories as opposed to conversation, and at the moment we have no good techniques to find this out.

jat this, close to the speaker
jamut ‘this, close to the hearer’
tot ‘that, far away from the speaker’
godot ‘that, far away and lower than the speaker’
ʁodot  (the first sound is a bit like the French pronunciation of r) ‘that, far away and higher than the speaker’

 

We will lose a system where subject and object in the sentence work differently from what we are used to in European languages. In most European languages, the subjects of transitive and intransitive verbs have the same form (as in He arrived and He brought her along), while the object gets a different marking  (She arrived vs. He brought her along). In Archi, the subject of an intransitive verb such as ‘arrive’ is marked the same as the object of a transitive verb such as ‘bring’:

Tuw qa ‘he arrived’

Tormi tuw χir uwli ‘She brought him’.

This is called Ergative-Absolutive alignment, and was first brought to the attention of  linguists by the Australian language Dyirbal, which is now already dead. Several other linguistic families of the world use the same way of making sentences, including Archi. As not many Dyirbal materials have been recorded, it is Archi and other endangered Daghestanian languages that have been making linguists reconsider universals about subject, object and verb relations.

This is only a glimpse of the impact that endangered languages have on linguistics as a discipline. In the last few decades, linguists have become much more aware of how invaluable endangered languages are and how fragile their futures, and more and more efforts are now directed to documenting and – whenever possible – preserving the linguistic diversity of the world.

Morphological Redundancy – Why say something twice when once will do?

Morphological Redundancy – Why say something twice when once will do?

In Batsbi (a language spoken in the Caucusus in North-East Georgia), if you want to say ‘she is ripping the dress’ you might say something like yoxyoyanw k’ab. In this word, each instance of ‘y’ (highlighted in bold) indicates that it is indeed just one dress that she is ripping.

Linguists call this phenomenon multiple exponence, where a single meaning is indicated within a word more than once, for no apparent reason. This, when you think about it, is pretty weird. Typically we think of languages as incremental in nature: intuitively, we assume that when we add something to a word or a sentence we are adding meaning to that word or sentence. But in multiple exponence this clearly can’t be the case. The dress in the Batsbi example is no more singular than any other singular object in the world, so why have three ‘y’s’ rather than just the one we would expect?

In other words, why say something twice when once will do? The short answer is we don’t know (yet!) – sorry to disappoint! But what I can answer is a slightly different question: what does it actually mean to say something twice?

Multiple exponence is not the only way you might say something twice within a word. There is another phenomenon known as overlapping exponence, where the same meaning is indicated by multiple markers in a word (as with multiple exponence), but each marker is also doing some other job. For example, in Filomeno Mata Totonco (a language from Mexico) you say ‘you are coming’ using the word tanpaati. This word has two suffixes, paa and ti, both of which mean ‘you’ (second person). However, the paa also indicates that the event is progressive (like the English –ing), while the other suffix ti indicates that the subject is singular rather than plural. So speakers of this language mention that it’s you who is coming twice, but we couldn’t remove either of the suffixes from the word without affecting the meaning, as both of them also tell us something else about what’s going on.

In Wipi, a language spoken in the Fly River Delta on the south coast of Papua New Guinea, if you want to say that you are building two houses you would use the word arangen which literally means ‘I build two’. This word is rather interesting since you need both the prefix, a, and the suffix, en, to know that this is indeed only two houses as opposed some other number of houses. Yet neither of these affixes actually means ‘two.’ Instead, the suffix en is ambiguous between one or two; we might say it means less than three. The prefix a, in contrast, is used when you are building two or more houses; in other words, it means more than one. Thus, if you are building more than one house but also less than three, there is only one interpretation: you are building two houses. This is called distributed exponence. It’s remarkable that speakers of Wipi say how many houses they are building twice, but in order to know the exact number of houses, you need to listen both times!

The Fly River Delta

It’s amazing really, when you look closely at a simple question like what does it mean to say something twice?, that there is such complexity and diversity in the answer. Beyond what we saw, there are all sorts of in-between cases and the multiple types can interact. As such, teasing them apart can be a real challenge. When I say something twice, it might be that each time gives you more information in subtly different ways. It is untying this kind of subtle diversity which hopefully gives us some hint as to why speakers and languages would ever do such a thing to begin with.

Sense and polarity, or why meaning can drive language change

Sense and polarity, or why meaning can drive language change

Generally a sentence can be negative or positive depending on what one actually wants to express. Thus if I’m asked whether I think that John’s new hobby – say climbing – is a good idea, I can say It’s not a good idea; conversely, if I do think it is a good idea, I can remove the negation not to make the sentence positive and say It’s a good idea. Both sentences are perfectly acceptable in this context.

From such an example, we might therefore conclude that any sentence can be made positive by removing the relevant negative word – most often not – from the sentence. But if that is the case, why is the non-negative response I like it one bit not acceptable, odd when its negative counterpart I don’t like it one bit is perfectly acceptable and natural?

This contrast has to do with the expression one bit: notice that if it is removed, then both negative and positive responses are perfectly fine: I could respond I don’t like it or, if I do like it, I (do) like it.

It seems that there is something special about the phrase one bit: it wants to be in a negative sentence. But why? It turns out that this question is a very big puzzle, not only for English grammar but for the grammar of most (all?) languages. For instance in French, the expression bouger/lever le petit doigt `lift a finger’ must appear in a negative sentence. Thus if I know that John wanted to help with your house move and I ask you how it went, you could say Il n’a pas levé le petit doigt `lit. He didn’t lift the small finger’ if he didn’t help at all, but I could not say Il a levé le petit doigt lit. ‘He lifted the small finger’ even if he did help to some extent.

Expressions like lever le petit doigt `lift a finger’, one bit, care/give a damn, own a red cent are said to be polarity sensitive: they only really make sense if used in negative sentences. But this in itself is not the most interesting property.

What is much more interesting is why they have this property. There is a lot of research on this question in theoretical linguistics. The proposals are quite technical but they all start from the observation that most expressions that need to be in a negative context to be acceptable are expressions of minimal degrees and measures. For instance, a finger or le petit doigt `the small finger’ is the smallest body part one can lift to do something, a drop (in the expression I didn’t drink a drop of vodka yesterday) is the smallest observable quantity of vodka, etc.

Regine Eckardt, who has worked on this topic, formulates the following intuition: ‘speakers know that in the context of drinking, an event of drinking a drop can never occur on its own – even though a lot of drops usually will be consumed after a drinking of some larger quantity.’ (Eckardt 2006, p. 158). However the intuition goes, the occurrence of this expression in a negative sentence is acceptable because it denies the existence of events that consist of just drinking one drop.

What this means is that if Mary drank a small glass of vodka yesterday, although it is technically true to say She drank a drop of vodka (since the glass contains many drops) it would not be very informative, certainly not as informative as saying the equally true She drank a glass of vodka.

However imagine now that Mary didn’t drink any alcohol at all yesterday. In this context, I would be telling the truth if I said either one of the following sentences: Mary didn’t drink a glass of vodka or Mary didn’t drink a drop of vodka. But now it is much more informative to say the latter. To see this consider the following: saying Mary didn’t drink a glass of vodka could describe a situation in which Mary didn’t drink a glass of vodka yesterday but she still drank some vodka, maybe just a spoonful. If however I say Mary didn’t drink a drop of vodka then this can only describe a situation where Mary didn’t drink a glass or even a little bit of vodka. In other words, saying Mary didn’t drink a drop of vodka yesterday is more informative than saying Mary didn’t drink a glass of vodka yesterday because the former sentence describes a very precise situation whereas the latter is a lot less specific as to what it describes (i.e. it could be uttered in a situation in which Mary drank a spoonful of vodka or maybe a cocktail that contains 2ml of vodka, etc)

By using expressions of minimal degrees/measures in negative environments, the sentences become a lot more informative. This, it seems, is part of the reason why languages like English have changed such that these words are now only usable in negative sentences.

Double trouble treble

Double trouble treble

You’ll get in trouble if you drink a tripel, the strong pale ale brewed by the most hipster of monks, the Trappists.

The Lowlands are the Hoxton of Europe

Tripels have three times the strength (around 8-10% percent ABV) of the standard table beer historically consumed by the monks themselves. This enkel or ‘single’ beer was traditionally not available outside the cloisters, while the duppel (a double strength dark brown beer made with caramelized beet sugar) was sold to provide income for the monastery. Although the term enkel is no longer in common beer parlance (it is on the cusp of a comeback), duppel and tripel have held their ground. It is generally thought that the tripel takes its name from its threefold strength, but it is also sometimes claimed that it is because it has three times the malt of a regular brew. A quadrupel is VERY strong.

As we have seen already in this blog when counting sheep in Slovenian and yams in Ngkolumbu, means for the expression of quantities and multiplication are often linguistically fascinating. Not least the doublet treble and triple, which originate from the same etymological source.

The Latin word triplus ‘threefold, triple’ first entered English via Old French treble. Not satisfied with claiming the space previously occupied by the Old English adjective þrifeald ‘threefold’, it turned up again by the 15th century as the adjective triple.

This triad of modifiers (threefold, treble and triple) exemplify some of the pathways by which lexical synonymy can come about. The first word was formed through a compounding processes (i.e. the numeral three forming a new word with the multiplicative form –fold), the second entered the language through direct borrowing, and the third through a second wave of borrowing (either from Old French triple or Latin triplus).

We don’t just find words competing to express the same meaning, but also parts of words. The –fold element of threefold, tenfold and manifold, and the –plus of triplus, are argued to have developed from the same Proto Indo-European root *pel ‘to fold’. To complicate things even further, the now obsolete treblefold was attested between the 14th and 16th centuries. Words, it seems, like to fight for the same space, and can sometimes be incestuous.

Since entering English over 500 years ago, triple and treble have staked out different paths, but retained similar meanings in at least some of their manifestations, as explored by Catherine Soanes on the OxfordWords blog. In terms of frequency, triple is the stronger twin (or is it a triplet? quadruplet?), ending up triumphant with around 6 times more occurrences in the Oxford English Corpus.

But treble has some resilience. Although the official Scrabble board has double and triple word scores, treble word scores are occasionally referred to on the net (albeit erroneously, or in a devil-may-care way), such as in Charlie Brooker’s article on how to cheat at scrabble. I even found a ‘threefold word score’ on a Scrabble knock-off site. Lawyers to the ready!

This demonstrates that these adjectives really are semantically interchangeable for the most part, even though their distributions are not identical.

The take home? While not not every monastery sells the same tripel, they will all get you drunk.