Home --> Software Design Issues --> Unicode --> idio_wri
Idiosyncrasies of Writing systems in India.

Writing systems followed in India are considered complex on account of the rules which specify how a syllable should be written. The reader is advised to look at the page discussing the principles of writing systems before looking at the current page which concentrates on the problem of rendering syllables on a computer.

By and large most languages of India follow the syllabic writing system which represent syllables rather than pure consonants and vowels. Though there can be thousands of syllables, the writing systems generally follow some rules by which the syllables are shaped. These rules allow a syllable to be built up from a smaller set of shapes which include the vowels, consonants and the representations for the medial vowels. This smaller set is usually made available in a font and on a computer a syllable is shaped typically by placing the glyphs in the required order.

It will help if we specify the manner in which a syllable is shaped by examining the structure of the syllable.

A syllable may be made up of

1. A pure vowel . 

  This usually applies to a vowel appearing at the beginning of a word, though in some languages, a pure vowel may be seen inside a word. A pure vowel has a unique shape and is written using this shape wherever it occurs.

2. A consonant with an implied "ah". 

  The consonants of our languages cannot be pronounced easily unless a vowel is attached to the consonant or other consonants  follow. Unlike in western scripts where a consonant is always written in its generic form, consonants in India are almost always written with an implied "ah" so that one can pronounce an independent consonant directly without having to refer to it by a name (unlike in western languages where each letter has a name). 

  e.g.,  "m"  is normally referred to as "em" and only when an "a" comes with it as in "ma" will one say it as "ma". In Sanskrit (and in other India languages), when you see the consonant 'm', you will know that it is to be pronounced "ma".

  This subtle distinction has to be retained when a child is taught the writing system.

  In Indian scripts, a generic consonant occurs only as part of a syllable and not by itself except that a word may end in a generic consonant. Hence the writing convention includes a special form for the same by attaching a "halanth" ligature. So    m¯   is the generic form of  m  but it is not easy to pronounce it by itself. (Try saying "hmm")

  A pure consonant is written using the shape assigned to the consonant.

3.  A consonant vowel combination.

  In India, one refers to the consonant as the body and the vowel as the one that gives a consonant its life. Hence the vowel symboically represents life.

  This simple syllable is almost always written by adding a ligature to the shape of the consonant which ligature depends on the vowel. This medial vowel representation has specific forms in specific scripts. There are exceptions to this rule as well in some of the scripts (Tamil and Malayalam).

In the above, we see three scripts where the syllables with "ta" have been formed with all the vowels. Notice that in Tamil, the Matra (ligature) can have components on both sides of the consonant while in Telugu, the components may be written above and below the consonant as well as on one side. 

4.  Two or more consonants and a vowel.

  Very simply, we can say this conforms to the ccv, cccv,  ccccv  etc.  format.

  It will be useful to point out here that one cannot really have arbitrarily long syllables. It will become almost impossible to pronounce them. By and large two and three consonant syllables are common and very few with four or five consonants. One sees long syllables even in English (Angstrom!)

  Across all the languages of India, approximatley eight hundred to a thousand syllables ( with implied vowel "ah") are known to be present in spoken and written form. Since a basic syllable can include any of the vowels, the number of actual syllables will be of the order of about eight thousand, for all the vowels may not be seen with a base syllable which has two or more consonants in it.

Rules for generating the display

1.   A pure vowel or a basic consonant has an individual shape associated with it. This shape has evolved over a period of time but one does find significant variations in older manuscripts. A pure vowel or a basic consonant is always displayed by drawing the associated shape.

  The forms for all the vowels and pure consonants are defined uniquely in each script.

2.  A consonant vowel combination is written with a Matra (ligature) atatched to the basic consonant. The Matra may be drawn on either side of the consonant and in some cases, it is written on both sides or above and below a consonant. This applies to Tamil, Telugu, Malayalam, Bengali and older scripts such as Grantha.

  Now, it is also true that in Tamil and Malayalam, there is no specific matra in respect of the vowels "uh" and its long version. No matras are applicable here and these will have to be remembered as exceptions.
In most scripts, there will be such exceptions for specific combinations and these exceptions will have to be kept in mind when rendering the syllable.

3. The shape for a consonant in a syllable may be roughly specified  by applying the rules observed in practice for each script. There rules vary across scripts. Some of the rules are explained below.

  The half form of a consonant is normally used in many cases, especially with scripts which are closer to Devanagari e.g., Gujarati. The half form is also referred to as the joining form. Usually, the half form has enough resemblance to the full form of the consonant.

  However, the half form is not defined for all the consonants, especially those which do not have a vertical stroke in them (Devanagari). Several consonants which do not have a clearly defined half form are shown in the figure above.In these cases, a form diminished in size but in a manner where the consonants can be written one below the other is considered useful. Again, examples are seen in the figure above,

  The one below the other form is actually the default for South Indian scripts, except Tamil. In these, there is no half form for a consonant. The first consonant in the syllable is written first, the second is written below in reduced size and the third may also get written below this combination. Since one seldom finds arbitrarily long syllables and most of the three or four consonant syllable end with "ra" or "ya", the actual need to write three consonants one below the other arises only rarely. The syllables with "ra" or "ya" as the last consonant have a special form for them.

Composing syllables with generic consonants

  The shape of a syllable can always be built by using the generic form of the consonants. This will be linguistically correct though not conforming to convention. Using generic consonants to write syllables generally results in a smaller set of shapes for the writing system. Among the Indian languages, Tamil employs a simple script where a syllable is always shown in this manner.

Syllable  Representation Examples

When we compare the rules across different scripts, the following seem to apply in general, though different rules may apply in different scripts for the same syllable. In other words, several displayed forms may refer to the same sound.
  • Concatenate half forms except for the last consonant. 
  • Write the consonants one below the other but retain their basic shapes with diminished size.
  • Use special ligatures for specific vowel combinations in some of the scripts.
  • Use unique forms for a syllable.
  • Just decompose any syllable into its consonants and the vowel.
  • Use special ligatures for "ra" in Devanagari based scripts. The ligature will depend on where "ra" occurs within the syllable.
  • Use special ligatures for other consonants as well. This applies to Telugu.
  • The medial vowel representations may have ligatures on both sides of the consonant.
 The following are illustrative of syllable formation in different scripts. The variations in the writing systems will be seen by examining these carefully. This is not an exhaustive set but is provided only as an example.
Multilingual Computing- A view from SDL

Writing systems
Linguistic requirements
Dealing with Text
Computing requirements (for India)

Unicode for Indian Languages

The conceptual basis for Unicode

Unicode for Indian scripts
Data entry
Issues in rendering Unicode
Using a shaping engine
Discussion on sorting
Open type fonts

Unicode support in Microsoft applications

Limitations of Uniscribe

A review of some MS applications supporting Unicode

Recommendations for Developers of Indian language Applications

Using True type fonts to render Unicode Text

Can we simplify handling Unicode text?

Guidelines for development under Linux

Summary of SDL's observations

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Jan. 20, 2018
Local Time: 22 46 01

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on     Best viewed at 800x600 or better