Home -->  Linguistics --> Unicode for Tamil
Unicode for Tamil

  Among the Indian languages, Tamil employs a simpler script with an equally simple orthography. Surprisingly, this simplicity is seen even as early as two thousand years ago. The Brahmi script for Tamil employs a limited number of shapes being restricted to just the twelve vowels and eighteen consonants. Conjunct aksharas involving two or more consonants are simply composed from the generic equivalents of the consonants and thus ligatures or conjunct forms are totally avoided. Pure Tamil could therefore be written with 18x12 shapes together with 18 generic consonant shapes and 12 vowel shapes. The special letter known as the "ayda" letter should also be included in the set which brings the number to 247 shapes. These can be further reduced if typesetting is effected using overlapped shapes.

  Today, Tamil orthography continues to retain its simplicity though the shapes have changed from the days of Brahmi. It seems as if 247 code values would adequately represent the syllables of Tamil. This is indeed so but in practice, one must concede the presence of consonants from Sanskrit which have come into fairly regular use. There are six sanskrit consonants which should be included along with their representation with the twelve vowels as also their generic forms. This will add 78 more shapes to the set. Thus the full set set will require more than 8 bits per code.

  Unicode for Tamil initially specified codes only for the basic vowels and consonants together with the medial vowel forms.  The "pulli" that turns a consonant into its generic (linguistically defined) form is also viewed as a medial vowel.

  The initial assignment of Unicode for  Tamil included only nine numerals omitting the code for zero since it was presumed that symbols were available for tens, hundreds and such. This omission was the subject of much debate and the recent version of Unicode has included the same. Also the consonant "ca" from Sanskrit (the one seen in Shree) which was not included in the earlier version has now been included.

Differing opinions among the experts in respect of Tamil Unicode.

  During the past few years, the topic of assignment of Unicode for Tamil has been debated much in the computing circles of Tamilnadu. Regrettably, the discussions have not resulted in any form of consensus. The differing opinions have to do with the interpretation of the linguistic definition of a consonant. The basic issue has to do with whether codes must be assigned for a generic consonant (without any  vowel) or whether it is alright to view a cosnonant as one with the implied "ah". Apparently, the experts have woken upto the fact that the coding method should conform to linguistic requirements. The absence of any form of consensus has created more confusion and it is unlikely that any of the ideas discussed will find a place in the coding scheme.

  Given below are some links to pages discussing the issue of Unicode for Tamil. Some of them are proposals for effecting changes to current Unicode assignments. Some of the URLs are difficult to reproduce since they contain spaces or other wide characters. It should be possible to access the documents with a bit of effort though.
(Tamil Encoding in Unicode - A Comparative Study) 
(Problems as perceived by an expert)
(Though not directly related to Unicode, a standard that is popular),month,2005-08.aspx
(The URL may be difficult to input. Try a search for the last part of the URL)
Tamil%20Unicode.html    (Please search for Natkeeran at this site)


Tamil script is perhaps the simplest among the scripts of India.

Tamil orthography uses a relatively smaller set of shapes and syllables with two or more consonants utilize the dot (pulli) to advantage.

It is possible to typeset Tamil with as few as 80 glyphs in a font. In fact most 8 bit fonts designed for Tamil have less than 90 glyphs. Such fonts cannot be used as OpenType fonts since it would be necessary to compose syllables using one or more glyphs.

Opentype fonts available for Tamil also incorporate a smaller set of glyphs compared to other scripts. It is quite easy to design an Opentype font by specifying "composite glyphs".

One of the most debated issues in respect of Tamil Unicode is not the coding scheme but the data entry scheme! There are no meaningful standards that are followed despite some recommendations during the Tamilnet99 meet.

The difficulties encountered in coping with the implementation of Unicode for Indian Languages have been covered in a separate section of this site.

Though rendering of Tamil Unicode text should pose fewer difficulties compared to other scripts, one continues to encounter problems. Difficulties encountered in dealing with Tamil Unicode are explained in a section with examples from applications running under Windows Vista and Ubuntu.
Acharya Logo
  Inscription in Early Brahmi script inside a cave situated in Tamilnadu, South India. The text includes the word "satiaputo" which stands for Emperor Ashoka whose emissaries spread Buddhism in the South and Sri Lanka. The letter "sa" is not seen in Tamil and so the inscription must have ben effected by persons who knew Sanskrit as well as Tamil.

Image graciously offered for reproduction in this page by Sri. Iravatham Mahadevan.

Today is Mar. 26, 2017
Local Time: 13 13 58

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/30/12    Best viewed at 800x600 or better