Arabic based scripts

Arabic based scripts

Scripts spread like bad news. Look at the Latin script, which is the ultimate winner considering the hundreds, if not thousands of languages that use it today. Political power and religion have caused the Latin script to serve as the basis for this proliferation of written languages, first in Europe, and then almost everywhere else, including many languages that had no written tradition before the Western influence. The exceptions are the scripts that have a strong enough tradition that keeps them going.

However, the Latin script is not the only prevalent one. Wikipedia lists 95 languages that are using, or have actively used the Arabic script. In this post we will be looking at how they do it.

The way different languages use a script can vary significantly. Some can invent new versions of letters that express the peculiar sounds of a language, such as the long vowels in Hungarian: á, í, é, ó, ú, ő, ű. Others, like English, combine existing letters to do the same job, like th or ch. Some will get rid of the letters that are not useful enough. Next time you visit Turkey, look at the taxi signs.

A Turkish taksi

One way we could classify writing systems is how helpful they are, if someone intends to read them. Chinese is famously not very helpful. Even though some characters will give a hint on how to pronounce the word, or what it means, generally you have to learn thousands of characters, that refer to separate “words”. English is rather helpful in the sense that the letters generally help the reader figure out what sound is supposed to be pronounced. Not always, thouGH. Sometimes it is touGH to determine how to pronounce GH, for example. Is it /f/, /g/ or /nothing/? Learners have to learn the differences individually. The most helpful scripts represent a speech sound with a single letter consistently. Look at Turkish! Nobody needs an X if you have KS, that perfectly does the job at all times.

Arabic is similar to English in this classification, but in a completely different way. In order to understand what is going on, we must know what templatic morphology is. When creating new words, most languages add meaningful bits to the beginning, or to the end of a word. Or both, like in the case of my favorite Metallica song, the Un-forgive-n. We can say that English, in most cases, uses a word as a base for such operations. Arabic, on the other hand, uses two or three consonants, as a base. They are not words; they rather represent a broad concept. The schoolbook example is K-T-B, which represents the broad concept of writing. Arabic, then, adds things before, after and in between (i.e. applies the three consonants to a template). The templates also have meanings and thus narrow down the concept’s meaning to a word, that can actually be used in the language. There are only two rules when inserting the three consonants into a template: 1) Do not skip any consonant, and 2) keep their order. Let’s see a few examples, how these templates work. The capital letters are the base consonants, and the small letters fill in the template.

Template meaning K-T-B ‘write’ M-L-K ‘rule, possess’
place where happens maKTaBa ‘library’ maMLaKa ‘kingdom’
person who does it KāTiB ‘writer’ MāLiK ‘king’
passive (being done) maKTūB ‘written’ maMLūK ‘slave’

Long story short, templates are extremely important in Arabic. This is combined with the unfortunate fact that Arabic has lots of consonants and very few vowels, namely, /a/, /u/, and /i/. They all contrast long and short versions, that gives a total of six vowels. On the contrary, there are 28 consonants. Here is a really nice introduction to Arabic speech sounds.

The facts above have led to a writing system where vowels are so ‘underrated’ that they are basically not marked. In fact, the long vowels are marked, but by specific consonants, that may be pronounced as a consonant, or considered as a sign that marks a long vowel. To illustrate this, let’s see some Arabic words, the raw information you get from the letters you see, some possible pronunciations, just for fun, and how you actually need to pronounce them.

مورد
raw information [m] [w/ū] [r] [d]
possible pronunciation mawarad, mūrad, mawrad, miward, muwarrid, muwarad…
actual pronunciation mawrid
meaning supplies

مدينة
raw information [m] [d] [y/ī] [n] [a]
possible pronunciation midayna, mudayna, madayna, mudīna, midīna, madīna…
actual pronunciation madīna
meaning city

Arabic has a way of signaling how a word should be pronounced exactly, but these additional signs above and below the main letters (diacritics) are only used in children’s reading books and in the Qur’ān. Nothing above and below the red lines actually appear in every-day texts or in handwriting.

Arabic script

In essence, instead of marking vowels with high precision, Arabic marks the consonants and in most cases, you can figure out the template as well. And if you know Arabic, then you know all the templates, so you don’t even really need those unmarked vowels.

The Arabic writing system fits the Arabic language really neatly, but what about other languages? Persian uses the Arabic script, but it has no templates. It is an Indo-European language with word formation rules that are very similar to the ones we find in European languages. So, how did they deal with this situation? Well, they did their best to mark vowels with a bit more precision. At the ends of words, Persian uses the letter /h/ to mark the vowels /e/ and /a/. The consonants that can signal the presence of a long consonant in Arabic, are used much more consistently, so when you see one, you can be almost sure that there is a long vowel. Apart from the vowel problem, Persian has also added a couple of consonants, that Arabic lacks, such as /p/, /g/ or /ch/.

Urdu is spoken mainly in Pakistan, and it is quite similar to Hindi, but let’s stick to the fact (there is a political debate), that it has retroflex consonants (the tip of the tongue curls backwards). Those are the speech sounds in many Indic languages that make them sound so recognizable. Urdu’s strategy is similar to what we saw in Persian, with the addition of the retroflex consonant. There is also an additional, second form of the letter h, that signals aspiration (the h-like sound after consonants, like in the words dharma, makhani or bhaji). The last addition is a differently shaped letter y, that marks /ay/ or /ey/, as opposed to a long /ī/. In Persian and Arabic, there is only one letter that represents these three sounds.

Urdu is also special in that the Urdu printed texts use a type of calligraphy, called Nasta’liq. This makes Urdu texts look very different from Arabic, but it is only a matter of fonts.

Arabic newspaper
Urdu newspaper

Lastly, let’s discuss a language that has completely reformed the Arabic script. Uyghur is a Turkic language spoken in the Xinjiang Uyghur Autonomous Region in Northwest China. As all Turkic languages, Uyghur has a large number of vowels and relatively few consonants. This makes the Arabic script a rather difficult choice for this language, unless some modifications are done. In the Uyghur script, every speech sound is represented in a consistent way, i.e. there is no ambiguity whatsoever. The set of consonants is essentially the same as in Persian, but there are nine additional letters that allow for a precise marking of vowels. For anybody else from the world of Arabic based scripts, the resulting text may appear somewhat weird. The following image illustrates how different this script is from the previous ones. The parts circled are the Uyghur innovations that would be incorrect in Arabic, Persian or in Urdu. Notice their proportion.

Uyghur script

The cherry on the cake is the Thaana script. It is used to write Dhivehi, an Indo-European language spoken in the Maldives. This script is based on Arabic, but in a unique way. Thaana started off as a secret script for sacred, religious texts. It was considered a way of encryption, and therefore the letters originate from Arabic letters, as well as Arabic numbers and Indic numbers (!). Imagine that you code a message that looks like this: 7q۳۶gt55۹۴. All speech sounds are precisely marked, as in Uyghur. Notice the vowel-marking diacritics above and below the main letters, and their similarity to the Arabic diacritics (in the picture above where the diacritics are separated with a red line). But of course, this script looks really different from the other ones we have seen.

Dhivehi newspaper

Linguists believe that only a handful of writing systems appeared independently around the world. Most languages had to adopt the script of another language, and due to different needs and strategies, we have ended up with a myriad of historically related, but still, different scripts. Linguists consider writing systems negligible, since they are just the representation of language, which we are truly interested in. I think, however, that the backgrounds of different scripts are amazing.

The death of the dual, or how to count sheep in Slovenian

The death of the dual, or how to count sheep in Slovenian

‘How cool is that?’ in German, literally ‘how horny is that then?’

One reason why translation is so difficult – and why computer translations are sometimes unreliable – is that languages are more than just different lists of names for the same universal inventory of concepts. There is rarely a perfect one-to-one equivalence between expressions in different languages: the French word mouton corresponds sometimes to English sheep, and at other times to the animal’s meat, where English uses a separate word lamb or mutton.

This was one of the great insights of Ferdinand de Saussure, arguably the father of modern linguistics. It applies not only in the domain of lexical semantics (word meaning), but also to the categories which languages organise their grammars around. In English, we systematically use a different form of nouns and verbs depending on whether we are referring to a single entity or multiple entities. The way we express this distinction varies: sometimes we make the plural by adding a suffix to the singular (as with hands, oxen), sometimes we change the vowel (foot/feet) and occasionally we don’t mark the distinction on a noun at all, as with sheep (despite the best efforts of this change.org petition to change the singular to ‘shoop’). Still, we can often tell whether someone is talking about one or more sheep by the form of the agreeing verb: compare ‘the sheep are chasing a ball’ to ‘the sheep is chasing a ball’.

Some languages make more fine-grained number distinctions. The English word sheep could be translated as ovca, ovci or ovce in Slovenian, depending on whether you’re talking about one, two, or three or more animals, respectively. Linguists call this extra category between singular and plural the dual. The difference between dual and plural doesn’t show up just in nouns, but also in adjectives and verbs which agree with nouns. So to translate the sentence ‘the beautiful sheep are chasing a ball’, you need to ascertain whether there are two or more sheep, not just to translate sheep, but also beautiful and chase.

Lepi ovci lovita žogo
beautiful sheep chase ball
Lepe ovce lovijo žogo
beautiful sheep chase ball

According to some, having a dual number makes Slovenian especially suited for lovers (could this explain the Slovenian tourist board’s decision to title their latest campaign I feel sLOVEnia?). But putting such speculations aside, it’s hard to see what the point of a dual could be. We rarely need to specify whether we are talking about two or more than two entities, and on the rare occasions we do need to make this information explicit, we can easily do so by using the numeral two.

This might be part of the reason why many languages, including English, have lost the dual number. Both English and Slovenian ultimately inherited their dual from Proto-Indo-European, the ancestor of many of the languages of Europe and India. Proto-Indo-European made a distinction between dual and plural number in its nouns, adjectives, pronouns, and verbs, but most of the modern languages descended from it have abandoned this three-way system in favour of a simpler opposition between singular and plural. Today, the dual survives only in two Indo-European languages, Slovenian and Sorbian, both from the Slavic subfamily.

In English the loss of the dual was a slow process, taking place over thousands of years. By the time the predecessor of English had split off from the other Germanic languages, the plural had replaced the dual everywhere except the first and second-person pronouns we and you, and verbs which agreed with them. By the earliest written English texts, it had lost the dual forms of verbs altogether, but still retained distinct pronouns for ‘we two’ and ‘you two’. By the 15th century, these were replaced by the plural forms, bringing the dual’s final demise.

Grammatical categories do not always disappear without a trace – in some languages the dual has left clues of its earlier existence, even though no functional distinction between dual and plural remains. Like English, German lost its dual, but in some Southern German dialects the dual pronoun enk (cognate with Old English inc, ‘to you two’) has survi­ved instead of the old plural form. In modern dialects of Arabic, plural forms of nouns have generally replaced duals, except in a few words mostly referring to things that usually exist in pairs, like idēn ‘hands’, where the old dual form has survived as the new plural instead. Other languages show vestiges of the dual only in certain syntactic environments. For example, Scottish Gaelic has preserved old dual forms of certain nouns only after the numeral ‘two’: compare aon chas ‘one foot’, dà chois ‘two feet’, trì casan ‘three feet’, casan ‘feet’.

Although duals seem to be on the way out in Indo-European languages, it isn’t hard to find healthy examples in other language families (despite what the Slovenian tourist board might say). Some languages have even more complicated number systems: Larike, one of the languages spoken in Indonesia, has a trial in addition to a dual, which is used for talking about exactly three items. And Lihir, one of the many languages of Papua New Guinea, has a paucal number in addition to both dual and trial, which refers to more than three but not many items. This system of five number categories (singular/dual/trial/paucal/plural) is one of the largest so far discovered. Meanwhile, on the other end of the spectrum are languages which don’t make any number distinction in nouns, like English sheep.