Writing Systems of India

This document presents a brief introduction to the writing systems followed for the languages of India.

Though all the languages of India use a nearly common set of consonants and vowels, there are significant differences between the writing systems characterized by their scripts. In the past, a language was not always associated with a specific script though language specific scripts had come into vogue. Today, nine scripts are in use and these cover all the National languages. The Urdu script is not covered in this document. 

Writing systems for the different languages of India are based on the representation of syllables as opposed to the use of the letters of the alphabet as seen in English or other European languages. Text written in Indian scripts has the general structure,

S1S2S3- - - SN

Where each of the "S"es is an appropriate representation for a specific syllable.  The displayed form of a syllable varies depending on the script used for the writing system and it is perfectly acceptable that a given language may be written in more than one script. Thus in Indian languages, it is not necessary that a specific script is associated only with a specific language.  For instance, the Telugu or Malayalam scripts may well be used for Sanskrit, as has been the practice for centuries.

The guiding principle underlying the writing systems is basically that the displayed shape for a syllable can be generated using certain rules which combine the shapes of individual consonants and the vowel the syllable is made of.  When a syllable is pronounced, it is strictly in the order of the consonants terminated by the vowel which always combines with the last consonant in the syllable.  In linguistic parlance, the displayed shape for a syllable is generally referred to as a consonant cluster or sometimes a conjunct. 

Each syllable is made up of one or more consonants and a vowel. A pure vowel is also reckoned as a syllable. A pure consonant does not have any vowel associated with it and is more like a phoneme. Pure consonants are difficult to pronounce in the absence of a combining vowel or other consonants. For this reason, linguistic scholars in India have always referred to a consonant as the physical body (lifeless) and the vowel as the life giving agent. A language cannot exist with only vowels or consonants. The combination is actually the essence of a language and these combinations are nothing but sounds that distinguish languages of the world.

We can represent a syllable as


While this may imply that arbitrary syllables may be formed, it is generally impractical to pronounce such combinations. So in reality, there is a finite set of these CC--V combinations which are meaningful in any given Indian language. Majority of the syllables in Indian languages are known to consist of just two or three consonants though occasionally one finds four or even five consonants making a syllable.

Writing systems for Indian languages therefore specify the manner in which the syllable is displayed.

The general principle followed in writing syllables is that a vowel in the syllable is always distinguished from its pure form (when it appears standalone). The pure form of a vowel is seen mostly as the first sound in a word though occasionally one will see pure vowels within a word.  Thus a specific displayed shape associated with a vowel which is part of a syllable with one or more consonants is generally known as Medial Vowel representation.  The Indian word for a Medial Vowel representation is "Matra". The shape of a medial vowel has little relationship with the shape of its pure form. Hence one must remember all the medial vowel shapes in a given script to be able to correctly identify the syllable.

The simplest of the syllables consisting of just one consonant and a vowel is displayed by combining the shape of the syllable and the shape of the Matra. 

The writing system also provides for displaying a pure consonant (also known as a generic consonant) using a specific Matra so that in principle one may write a syllable as a sequence of generic consonant shapes ending with the final consonant vowel combination which may be shown using the medial vowel form. The specific Matra added to a consonant to make it generic is usually known as the "halanth".

Shown below are the representations of the first consonant "ka" when it forms a syllable with all the vowels in a language. It may be remembered that the actual number of vowels vary across the languages of India.

The syllables may be represented in Roman if appropriate diacritic marks are used. The International Phonetic Alphabet also provides symbols for the aksharas of Indian languages. The line below shows the diacritics for the vowels. In the Southern scripts, one will also see the short vowels corresponding to "e" and "o".

In these  representations, we see that the Southern scripts have the shorter forms for two vowels 'e' and 'o'. In Sanskrit and other North Indian languages, only the long 'e' and 'o' exist. The vowel "ru" which is a vocalic "r" is not used in Tamil. Also, in Tamil one has unique shapes for specific combinations of consonants with the vowel 'u' and its long form as well.

The last shown shape in each case is strictly not a syllable but the form of the generic consonant i.e., the consonant without any vowel. Sometimes, this is placed first in the ordering of consonant vowel combinations. The lexical ordering of the vowels is well defined but it is not precisely specified whether the generic consonant is placed first or at the end. The collation algorithms will have to keep this in mind when dealing with codes assigned for the syllables.

We make several observations about the writing systems.

  • The shapes for the medial vowels depend on the script.
  • Across scripts there is no uniformity in the representation of the Matras. The Matra ( a ligature ) may be added to the basic shape of the consonant on the left, right, above, below or on both sides horizontally or vertically.
  • The phonetic value of the syllable may be the same in all the languages but its representation differs based on the script used.
  • In some specific cases (seen in Tamil), the shape of the basic consonant in the syllable may itself be altered.
  • For the same medial vowel, the positioning of its ligature also varies across scripts.
  • It may not be possible to identify the vowel from any one ligature for the Southern scripts. The same ligature may be used on one side only for some vowels but used with another ligature on the other side for other vowels. Bengali and Oriya also come under this.
When learning a script, it is important to know the shapes as well as the positioning of the Matras when any syllable is written.

Syllables with more than one consonant.

When more than one consonant is present in a syllable, the writing systems specify some basic rules about how the shape must be derived. The rules vary across the scripts but one observes the following types of representations.

1. The use of the half forms of the consonants.

 The approach is to display the half forms for the consonants for all but the last one in the syllable. This works basically for those scripts where there is a vertical line in the shape of the consonant. The half form is simply obtained by removing the vertical line. The vertical line is not removed however when it occurs in the middle of a consonant.

 Half forms are seen in Devanagari derived scripts. 

2. The use of one below the other form.

 In this approach, the consonants are written one below the other, the one on top being the first in the syllable and the rest written below in the same order of their occurrence.  The convention is that the consonant on top is drawn with the normal size but those written below reduced in size.  This allows better control of the space between lines of text. The one below the other form is more or less the standard for the Southern scripts though in Devanagari too, this form is used with consonants which have no vertical line in them.

  In Kannada and Telugu, special ligatures are used, when specific consonants appear below. In the example above, the ligature for "la" is different from its basic form as a consonant.  We may mention here that in the Grantha Script ( the script used in the past in South India for writing Sanskrit, some three consonant conjuncts are also written one below the other in three layers.

3. The use of specific shapes for some syllables.

 Here, the syllable has to be individually identified from the displayed shape which may not really resemble any of the consonants in it. Often such shapes are obtained by fusing together the shapes of the individual consonants and one may be able to discern the individual consonants only with some effort. Quite a number of syllables are written this way in Devanagari.

4. The use of special forms for the consonants  "ra"  and "ya".

 This applies to most scripts where the consonant "ra" or "ya" appear in a syllable.  In Devanagari more than  one shape for "ra" is applicable depending on whether it comes first in a syllable or at the end and also whether the preceding consonant has a vertical line. In most scripts a very special ligature applies when "ya" comes as the last consonant in the syllable. By far, syllables with "ra" are the most complex in respect of rendering since the rules vary widely across the scripts. Another point to remember is that invariably, the sequence of ligatures will not conform to the linguistic order of the consonants in the syllable.

5. Displaying a syllable through generic forms of the  consonants.

 This is a very basic and perfectly acceptable approach to displaying a syllable. Here, the sequence of sounds is just reflected in the sequence of generic consonants displayed. Devanagari and other scripts usually allow this form of representation when it may not be possible to print the syllable using appropriate ligatures due to their absence in the "Type" used in printing. In Tamil and to some extent in current day Malayalam, syllables are shown with the generic forms of the consonants with the matra added to the last consonant in the syllable.

 To give a few examples,

 It must be remembered that convention demands that ligatures which have remained in use for a very long  time be continued to the extent permitted by the "Type" used in printing. For proper rendering of most  scripts, upto five or six hundred Glyphs may be required but the use of eight bit fonts in computers does impose restrictions on this.

Adding Matras to the consonants in a syllable

Since a proper syllable always has a vowel in it, the rule for adding the vowel basically follows the rules for adding Matras to a consonant. There will however be exceptions to this rule. In Devanagari derived scripts, the Matra is added to the last consonant except when it occurs before the consonant, in which case it is placed at the very beginning. In Telugu and Kannada, the Matra goes with the consonant written at the top.
Dealing with longer syllables

Ligatures are usually used with combinations of two or at most three consonants. When a syllable has more than three consonants, it may be necessary to display the same using the same principles used for two consonants such as the use of half forms or the one below the other form etc. In most  languages, longer syllables occur only with "ra" and "ya" . Arbitrarily long syllables do not make sense in practice even if words are formed with them.

Thus we observe that a syllable may be displayed with different ordering of the ligatures depending on the script used.  From the order of the ligatures, it will not be easy to identify the syllable unless one knows the rules for the writing system.

The linguistic ordering of the consonants and the vowel may not be adhered to in the ordering of the ligatures corresponding to the consonants and the vowel. This complicates text processing when text is represented in terms of codes for the ligatures, as is done with applications which work with font glyphs directly.

Let is see this through some additional examples. 

Consider the English word  "extra". There are two syllables here. The beginning "e" by itself constitutes one while the "xtra" constitutes the second. This word is shown written in  four different scripts namely,

When reckoned in terms of basic sounds, "xtra" is a syllable with four basic consonants and one vowel.

In Devanagari, the syllable is written using "half forms" for the first two consonants and a special form for the combination of the last two. The same is true in Bengali. One observes that the order in which these half forms and combination forms occur, conforms to the order in which the sounds are uttered, in this case, k s t r A (A is used to indicate the long vowel).

In Tamil, the syllable is shown using only the generic forms of the consonants, distinguished by the use of the dot above the consonant. In Tamil too, the order in which the shapes occur in the syllable, conforms to the order of the sounds.

Telugu shows a departure. The syllable starts with a ligature representing "ra" even though "ra" is the last consonant in the syllable. In earlier times, this ligature used to be written below and not in the beginning. Also we observe that the Matra for "aa" actually goes with the consonant "ka" though it belongs to "ra".

In the Southern scripts (except Tamil), it is often the case that the order of displayed shapes differs from the order of sounds in the syllable.  Those familiar with the conventions of the writing systems will readily see what the syllable is by automatically associating the ligatures with the required order of pronunciation.

To get a computer program to associate an appropriate displayed shape with a syllable is therefore a difficult task, especially if the application has to cater to multilingual text. 

The complexities of the writing systems followed for Indian languages is discussed in detail in a separate page. The different rules followed for different scripts are explained in a comprehensive manner so as to bring out the similarities as well as the variations across the major scripts of India.This discussion will help designers of fonts and others who may want to develop applications supporting user interfaces in different Indian languages. 

The current page has specific examples which bring out the variations seen across the scripts.

