Arabic based scripts
Scripts spread like bad news. Look at the Latin script, which is the ultimate winner considering the hundreds, if not thousands of languages that use it today. Political power and religion have caused the Latin script to serve as the basis for this proliferation of written languages, first in Europe, and then almost everywhere else, including many languages that had no written tradition before the Western influence. The exceptions are the scripts that have a strong enough tradition that keeps them going.
However, the Latin script is not the only prevalent one. Wikipedia lists 95 languages that are using, or have actively used the Arabic script. In this post we will be looking at how they do it.
The way different languages use a script can vary significantly. Some can invent new versions of letters that express the peculiar sounds of a language, such as the long vowels in Hungarian: á, í, é, ó, ú, ő, ű. Others, like English, combine existing letters to do the same job, like th or ch. Some will get rid of the letters that are not useful enough. Next time you visit Turkey, look at the taxi signs.
One way we could classify writing systems is how helpful they are, if someone intends to read them. Chinese is famously not very helpful. Even though some characters will give a hint on how to pronounce the word, or what it means, generally you have to learn thousands of characters, that refer to separate “words”. English is rather helpful in the sense that the letters generally help the reader figure out what sound is supposed to be pronounced. Not always, thouGH. Sometimes it is touGH to determine how to pronounce GH, for example. Is it /f/, /g/ or /nothing/? Learners have to learn the differences individually. The most helpful scripts represent a speech sound with a single letter consistently. Look at Turkish! Nobody needs an X if you have KS, that perfectly does the job at all times.
Arabic is similar to English in this classification, but in a completely different way. In order to understand what is going on, we must know what templatic morphology is. When creating new words, most languages add meaningful bits to the beginning, or to the end of a word. Or both, like in the case of my favorite Metallica song, the Un-forgive-n. We can say that English, in most cases, uses a word as a base for such operations. Arabic, on the other hand, uses two or three consonants, as a base. They are not words; they rather represent a broad concept. The schoolbook example is K-T-B, which represents the broad concept of writing. Arabic, then, adds things before, after and in between (i.e. applies the three consonants to a template). The templates also have meanings and thus narrow down the concept’s meaning to a word, that can actually be used in the language. There are only two rules when inserting the three consonants into a template: 1) Do not skip any consonant, and 2) keep their order. Let’s see a few examples, how these templates work. The capital letters are the base consonants, and the small letters fill in the template.
|Template meaning||K-T-B ‘write’||M-L-K ‘rule, possess’|
|place where happens||maKTaBa ‘library’||maMLaKa ‘kingdom’|
|person who does it||KāTiB ‘writer’||MāLiK ‘king’|
|passive (being done)||maKTūB ‘written’||maMLūK ‘slave’|
Long story short, templates are extremely important in Arabic. This is combined with the unfortunate fact that Arabic has lots of consonants and very few vowels, namely, /a/, /u/, and /i/. They all contrast long and short versions, that gives a total of six vowels. On the contrary, there are 28 consonants. Here is a really nice introduction to Arabic speech sounds.
The facts above have led to a writing system where vowels are so ‘underrated’ that they are basically not marked. In fact, the long vowels are marked, but by specific consonants, that may be pronounced as a consonant, or considered as a sign that marks a long vowel. To illustrate this, let’s see some Arabic words, the raw information you get from the letters you see, some possible pronunciations, just for fun, and how you actually need to pronounce them.
|raw information||[m] [w/ū] [r] [d]|
|possible pronunciation||mawarad, mūrad, mawrad, miward, muwarrid, muwarad…|
|raw information||[m] [d] [y/ī] [n] [a]|
|possible pronunciation||midayna, mudayna, madayna, mudīna, midīna, madīna…|
Arabic has a way of signaling how a word should be pronounced exactly, but these additional signs above and below the main letters (diacritics) are only used in children’s reading books and in the Qur’ān. Nothing above and below the red lines actually appear in every-day texts or in handwriting.
In essence, instead of marking vowels with high precision, Arabic marks the consonants and in most cases, you can figure out the template as well. And if you know Arabic, then you know all the templates, so you don’t even really need those unmarked vowels.
The Arabic writing system fits the Arabic language really neatly, but what about other languages? Persian uses the Arabic script, but it has no templates. It is an Indo-European language with word formation rules that are very similar to the ones we find in European languages. So, how did they deal with this situation? Well, they did their best to mark vowels with a bit more precision. At the ends of words, Persian uses the letter /h/ to mark the vowels /e/ and /a/. The consonants that can signal the presence of a long consonant in Arabic, are used much more consistently, so when you see one, you can be almost sure that there is a long vowel. Apart from the vowel problem, Persian has also added a couple of consonants, that Arabic lacks, such as /p/, /g/ or /ch/.
Urdu is spoken mainly in Pakistan, and it is quite similar to Hindi, but let’s stick to the fact (there is a political debate), that it has retroflex consonants (the tip of the tongue curls backwards). Those are the speech sounds in many Indic languages that make them sound so recognizable. Urdu’s strategy is similar to what we saw in Persian, with the addition of the retroflex consonant. There is also an additional, second form of the letter h, that signals aspiration (the h-like sound after consonants, like in the words dharma, makhani or bhaji). The last addition is a differently shaped letter y, that marks /ay/ or /ey/, as opposed to a long /ī/. In Persian and Arabic, there is only one letter that represents these three sounds.
Urdu is also special in that the Urdu printed texts use a type of calligraphy, called Nasta’liq. This makes Urdu texts look very different from Arabic, but it is only a matter of fonts.
Lastly, let’s discuss a language that has completely reformed the Arabic script. Uyghur is a Turkic language spoken in the Xinjiang Uyghur Autonomous Region in Northwest China. As all Turkic languages, Uyghur has a large number of vowels and relatively few consonants. This makes the Arabic script a rather difficult choice for this language, unless some modifications are done. In the Uyghur script, every speech sound is represented in a consistent way, i.e. there is no ambiguity whatsoever. The set of consonants is essentially the same as in Persian, but there are nine additional letters that allow for a precise marking of vowels. For anybody else from the world of Arabic based scripts, the resulting text may appear somewhat weird. The following image illustrates how different this script is from the previous ones. The parts circled are the Uyghur innovations that would be incorrect in Arabic, Persian or in Urdu. Notice their proportion.
The cherry on the cake is the Thaana script. It is used to write Dhivehi, an Indo-European language spoken in the Maldives. This script is based on Arabic, but in a unique way. Thaana started off as a secret script for sacred, religious texts. It was considered a way of encryption, and therefore the letters originate from Arabic letters, as well as Arabic numbers and Indic numbers (!). Imagine that you code a message that looks like this: 7q۳۶gt55۹۴. All speech sounds are precisely marked, as in Uyghur. Notice the vowel-marking diacritics above and below the main letters, and their similarity to the Arabic diacritics (in the picture above where the diacritics are separated with a red line). But of course, this script looks really different from the other ones we have seen.
Linguists believe that only a handful of writing systems appeared independently around the world. Most languages had to adopt the script of another language, and due to different needs and strategies, we have ended up with a myriad of historically related, but still, different scripts. Linguists consider writing systems negligible, since they are just the representation of language, which we are truly interested in. I think, however, that the backgrounds of different scripts are amazing.