Systems of India
This document presents a
brief introduction to the writing systems followed for the languages of
Though all the languages
of India use a nearly common set of consonants and vowels, there are significant
differences between the writing systems characterized by their scripts.
In the past, a language was not always associated with a specific script
though language specific scripts had come into vogue. Today, nine scripts
are in use and these cover all the National languages. The Urdu script
is not covered in this document.
Writing systems for the different
languages of India are based on the representation of syllables as opposed
to the use of the letters of the alphabet as seen in English or other European
languages. Text written in Indian scripts has the general structure,
S1S2S3- - - SN
Where each of the "S"es is
an appropriate representation for a specific syllable. The displayed
form of a syllable varies depending on the script used for the writing
system and it is perfectly acceptable that a given language may be written
in more than one script. Thus in Indian languages, it is not necessary
that a specific script is associated only with a specific language.
For instance, the Telugu or Malayalam scripts may well be used for Sanskrit,
as has been the practice for centuries.
The guiding principle underlying
the writing systems is basically that the displayed shape for a syllable
can be generated using certain rules which combine the shapes of individual
consonants and the vowel the syllable is made of. When a syllable
is pronounced, it is strictly in the order of the consonants terminated
by the vowel which always combines with the last consonant in the syllable.
In linguistic parlance, the displayed shape for a syllable is generally
referred to as a consonant cluster or sometimes a conjunct.
Each syllable is made up
of one or more consonants and a vowel. A pure vowel is also reckoned as
a syllable. A pure consonant does not have any vowel associated with it
and is more like a phoneme. Pure consonants are difficult to pronounce
in the absence of a combining vowel or other consonants. For this reason,
linguistic scholars in India have always referred to a consonant as the
physical body (lifeless) and the vowel as the life giving agent. A language
cannot exist with only vowels or consonants. The combination is actually
the essence of a language and these combinations are nothing but sounds
that distinguish languages of the world.
We can represent a syllable
While this may imply that
arbitrary syllables may be formed, it is generally impractical to pronounce
such combinations. So in reality, there is a finite set of these CC--V
combinations which are meaningful in any given Indian language. Majority
of the syllables in Indian languages are known to consist of just two or
three consonants though occasionally one finds four or even five consonants
making a syllable.
Writing systems for Indian
languages therefore specify the manner in which the syllable is displayed.
The general principle followed
in writing syllables is that a vowel in the syllable is always distinguished
from its pure form (when it appears standalone). The pure form of a vowel
is seen mostly as the first sound in a word though occasionally one will
see pure vowels within a word. Thus a specific displayed shape associated
with a vowel which is part of a syllable with one or more consonants is
generally known as Medial Vowel representation. The Indian word for
a Medial Vowel representation is "Matra". The shape of a medial vowel has
little relationship with the shape of its pure form. Hence one must remember
all the medial vowel shapes in a given script to be able to correctly identify
The simplest of the syllables
consisting of just one consonant and a vowel is displayed by combining
the shape of the syllable and the shape of the Matra.
The writing system also provides
for displaying a pure consonant (also known as a generic consonant) using
a specific Matra so that in principle one may write a syllable as a sequence
of generic consonant shapes ending with the final consonant vowel combination
which may be shown using the medial vowel form. The specific Matra added
to a consonant to make it generic is usually known as the "halanth".
Shown below are the representations
of the first consonant "ka" when it forms a syllable with all the vowels
in a language. It may be remembered that the actual number of vowels vary
across the languages of India.
The syllables may be represented
in Roman if appropriate diacritic marks are used. The International Phonetic
Alphabet also provides symbols for the aksharas of Indian languages. The
line below shows the diacritics for the vowels. In the Southern scripts,
one will also see the short vowels corresponding to "e" and "o".
In these representations,
we see that the Southern scripts have the shorter forms for two vowels
'e' and 'o'. In Sanskrit and other North Indian languages, only the long
'e' and 'o' exist. The vowel "ru" which is a vocalic "r" is not used in
Tamil. Also, in Tamil one has unique shapes for specific combinations of
consonants with the vowel 'u' and its long form as well.
The last shown shape in each
case is strictly not a syllable but the form of the generic consonant i.e.,
the consonant without any vowel. Sometimes, this is placed first in the
ordering of consonant vowel combinations. The lexical ordering of the vowels
is well defined but it is not precisely specified whether the generic consonant
is placed first or at the end. The collation algorithms will have to keep
this in mind when dealing with codes assigned for the syllables.
We make several observations
about the writing systems.
The shapes for the medial vowels
depend on the script.
Across scripts there is no uniformity
in the representation of the Matras. The Matra ( a ligature ) may be added
to the basic shape of the consonant on the left, right, above, below or
on both sides horizontally or vertically.
The phonetic value of the syllable
may be the same in all the languages but its representation differs based
on the script used.
In some specific cases (seen
in Tamil), the shape of the basic consonant in the syllable may itself
For the same medial vowel, the
positioning of its ligature also varies across scripts.
It may not be possible to identify
the vowel from any one ligature for the Southern scripts. The same ligature
may be used on one side only for some vowels but used with another ligature
on the other side for other vowels. Bengali and Oriya also come under this.
When learning a
script, it is important to know the shapes as well as the positioning of
the Matras when any syllable is written.
Syllables with more than
When more than one consonant
is present in a syllable, the writing systems specify some basic rules
about how the shape must be derived. The rules vary across the scripts
but one observes the following types of representations.
The use of the half forms of the consonants.
The approach is to
display the half forms for the consonants for all but the last one in the
syllable. This works basically for those scripts where there is a vertical
line in the shape of the consonant. The half form is simply obtained by
removing the vertical line. The vertical line is not removed however when
it occurs in the middle of a consonant.
Half forms are seen
in Devanagari derived scripts.
The use of one below the other form.
In this approach, the
consonants are written one below the other, the one on top being the first
in the syllable and the rest written below in the same order of their occurrence.
The convention is that the consonant on top is drawn with the normal size
but those written below reduced in size. This allows better control
of the space between lines of text. The one below the other form is more
or less the standard for the Southern scripts though in Devanagari too,
this form is used with consonants which have no vertical line in them.
and Telugu, special ligatures are used, when specific consonants appear
below. In the example above, the ligature for "la" is different from its
basic form as a consonant. We may mention here that in the Grantha
Script ( the script used in the past in South India for writing Sanskrit,
some three consonant conjuncts are also written one below the other in
The use of specific shapes for some syllables.
Here, the syllable
has to be individually identified from the displayed shape which may not
really resemble any of the consonants in it. Often such shapes are obtained
by fusing together the shapes of the individual consonants and one may
be able to discern the individual consonants only with some effort. Quite
a number of syllables are written this way in Devanagari.
4. The use of special
forms for the consonants "ra" and "ya".
This applies to most
scripts where the consonant "ra" or "ya" appear in a syllable. In
Devanagari more than one shape for "ra" is applicable depending on
whether it comes first in a syllable or at the end and also whether the
preceding consonant has a vertical line. In most scripts a very special
ligature applies when "ya" comes as the last consonant in the syllable.
By far, syllables with "ra" are the most complex in respect of rendering
since the rules vary widely across the scripts. Another point to remember
is that invariably, the sequence of ligatures will not conform to the linguistic
order of the consonants in the syllable.
Displaying a syllable through generic forms of the consonants.
This is a very basic
and perfectly acceptable approach to displaying a syllable. Here, the sequence
of sounds is just reflected in the sequence of generic consonants displayed.
Devanagari and other scripts usually allow this form of representation
when it may not be possible to print the syllable using appropriate ligatures
due to their absence in the "Type" used in printing. In Tamil and to some
extent in current day Malayalam, syllables are shown with the generic forms
of the consonants with the matra added to the last consonant in the syllable.
To give a few examples,
It must be
remembered that convention demands that ligatures which have remained in
use for a very long time be continued to the extent permitted by
the "Type" used in printing. For proper rendering of most scripts,
upto five or six hundred Glyphs may be required but the use of eight bit
fonts in computers does impose restrictions on this.
Matras to the consonants in a syllable
Since a proper syllable
always has a vowel in it, the rule for adding the vowel basically follows
the rules for adding Matras to a consonant. There will however be exceptions
to this rule. In Devanagari derived scripts, the Matra is added to the
last consonant except when it occurs before the consonant, in which case
it is placed at the very beginning. In Telugu and Kannada, the Matra goes
with the consonant written at the top.
with longer syllables
Ligatures are usually used
with combinations of two or at most three consonants. When a syllable has
more than three consonants, it may be necessary to display the same using
the same principles used for two consonants such as the use of half forms
or the one below the other form etc. In most languages, longer syllables
occur only with "ra" and "ya" . Arbitrarily long syllables do not make
sense in practice even if words are formed with them.
Thus we observe
that a syllable may be displayed with different ordering of the ligatures
depending on the script used. From the order of the ligatures, it
will not be easy to identify the syllable unless one knows the rules for
the writing system.
The linguistic ordering of
the consonants and the vowel may not be adhered to in the ordering of the
ligatures corresponding to the consonants and the vowel. This complicates
text processing when text is represented in terms of codes for the ligatures,
as is done with applications which work with font glyphs directly.
Let is see this through some
Consider the English word
"extra". There are two syllables here. The beginning "e" by itself constitutes
one while the "xtra" constitutes the second. This word is shown written
in four different scripts namely,
When reckoned in
terms of basic sounds, "xtra" is a syllable with four basic consonants
and one vowel.
In Devanagari, the syllable
is written using "half forms" for the first two consonants and a special
form for the combination of the last two. The same is true in Bengali.
One observes that the order in which these half forms and combination forms
occur, conforms to the order in which the sounds are uttered, in this case,
k s t r A (A is used to indicate the long vowel).
In Tamil, the syllable is
shown using only the generic forms of the consonants, distinguished by
the use of the dot above the consonant. In Tamil too, the order in which
the shapes occur in the syllable, conforms to the order of the sounds.
Telugu shows a departure.
The syllable starts with a ligature representing "ra" even though "ra"
is the last consonant in the syllable. In earlier times, this ligature
used to be written below and not in the beginning. Also we observe that
the Matra for "aa" actually goes with the consonant "ka" though it belongs
In the Southern scripts (except
Tamil), it is often the case that the order of displayed shapes differs
from the order of sounds in the syllable. Those familiar with the
conventions of the writing systems will readily see what the syllable is
by automatically associating the ligatures with the required order of pronunciation.
To get a computer program
to associate an appropriate displayed shape with a syllable is therefore
a difficult task, especially if the application has to cater to multilingual