Home -->  Software Design Issues --> Transliteration principles
Transliteration Principles
  Transliteration  refers to the process by which one reads and pronounces the words and sentences of one language using the letters and special symbols of another language.  Thus transliteration is meant to preserve the sounds of the syllables in words. Transliteration is helpful in situations where one does not know the script of a language but knows to speak and understand the language nevertheless. 

  For several decades now, Roman  transliteration has been used to represent texts of Indian languages, especially Sanskrit. In many printed books, a key to transliteration would be printed at the beginning  in the form of a table. Since it is difficult to represent the aksharas of Sanskrit using just  the twenty six letters of the Roman alphabet, scholars used varying schemes to accommodate sounds that could not be correctly indicated using appropriate Roman letters. 

  Here are some examples of transliteration as per the schemes which were in general use in the past. The schemes are somewhat arbitrary in the choice of the Roman letters.

  Sometimes phonetics symbols are used in place of the normal Roman letters. Phonetic symbols are basically the letters of the Roman alphabet with special marks known as diacritic marks. Here are some examples of transliteration using symbols from the phonetic alphabet. In the second set of aksharas shown below, one sees the use of special symbols from the ascii character set in place of diacritics.


Roman transliteration which makes use of diacritic marks works better for Indian languages and in the last few decades some standardization has been effected based on the recommendation  from the National Library in Calcutta.  Roman letter assignments in this scheme are phonetically equivalent to the aksharas of Sanskrit or other Indian languages. As  indicated earlier, the phonetic alphabet with diacritic marks is very helpful for representing text in Indian languages. Such letters are also easily typeset, for typefaces are available specifically for this purpose. Typesetting was however attempted manually for nearly a century until special word processing and typesetting applications were developed using computers. These programs make use of high quality fonts to produce good printouts and displays. However most of them rely on some indirect data entry methods to generate the phonetic symbols.
   The primary difficulty in data entry of the phonetic symbols is that there is no provision to input the symbols directly using the standard ASCII keyboard. Desktop publishing and word processing programs provide means by which the glyph code of the symbol  is input using the numeric keypad. While this is acceptable, it does not provide a natural approach. Transliteration methods which use only the displayable ASCII symbols do not run into this problem since the ASCII letters can be typed in directly. A special computer program would however be required to interpret the input string to produce the Indian languages display or printout. This is precisely what the currently popular transliteration schemes attempt. Schemes such as ITRANS, RIT, ADHAWIN etc., use only the standard displayable ASCII letters and symbols to transliterate the text. These schemes allow multiple representations for certain syllables and long vowels but the processing program handles this well.

Top of page

Lack of uniformity between different schemes.
   While transliteration based data input is very useful, one must remember that the schemes themselves vary, even for a given language. The consequence of this is that the data entry procedures will change depending on the scheme and worse still, a given transliterated string will produce different outputs for different languages/scripts. Take for instance the word 'yoga' . The transliterated data input for this string using the "ITRANS" scheme is "yogA". However, when you use this string to get an output in Tamil, using other schemes, you will get  as opposed to  which is the correct transliteration. The fact that the short forms of the vowels "o" and "e" are present only in the Southern languages is the
real issue here.

    Transliteration schemes have to face the problem of letters present in one language and not in the other. Thus, unless a superset of  letters from all the Indian Languages is formed,  uniform transliteration is ruled out. Even if such a superset were identified, it turns out that unique Roman letter combinations are not easily  identified for complex Aksharas. Moreover, the large number of vowels in Indian scripts also add to the complexity in transliteration. 

String Processing using transliterated text.
   One useful feature of transliterated representation of Indian Language strings is that conventional string processing programs may be used to process the text. However, applications such as sorting will produce erroneous results as the sorting order of the Aksharas and Roman letters are quite different. Many string processing applications such as processing a sentence may however work properly, so long as the input strings do not contain special characters which are needed for transliteration but can cause confusion if they happen to be delimiters fixed for parsing routines. 

    With transliterated input, the representation for syllables is always multibyte with varying number of bytes for different syllables. For example, if we were to examine the aksharas in the second row of the letters seen in the image above, we will see that the last two words contain two aksharas (samyuktaksharas are treated as aksharas since they constitute one syllable) each. However, the word "Arya" has four ascii letters but the word "dhR^shTvA"  has nine. So linguistically speaking, transliteration using Roman letters may not be the best choice for text processing at the level of a syllable.

    It would be helpful to have a representation which uses a fixed number of bytes for each syllable. Such a representation would be ideally suited for studying the metrical structure of poems or slokas.

 Transliteration features in the IITMadras Software.
   The Multilingual software from IIT Madras has incorporated features to help deal with transliterated text. The multilingual editor has a data entry method that directly allows transliterated text to be typed in and the text viewed in local scripts. A .llf file is also automatically created by the editor. Those familiar with ITRANS based input will find this feature helpful.  We also have some utilities for viewing and converting transliterated text. Additional information about the utilities is available.

Top of page

A note on the vagaries of transliteration.

The primary aim of transliteration is to provide an alternate means of reading text using a different script. Transliteration is meant to preserve the sounds. In practice, the same word may be written differently in different scripts due to the local conventions employed for pronouncing the aksharas.

In the northern scripts, the absence of the halanth is seen quite frequently but a person will understand that what is shown is really a generic consonant. The Southern scripts are explicit in the use of the halanth.

In India, English is also transliterated into the different Indian scripts with amusing results! Finding exactly matching aksharas is difficult for some of the vowels and a few consonants. A word in English, transliterated into say Hindi or Bengali, would have changed considerbly when transliterated again into another script (typically Southern scripts). Some of the problems encountered in practice are covered in a separate page titled "Vagaries of Transliteration".
Acharya Logo
A beautiful view of the hillside in the morning mist. The scene is in the Himalayas.

Today is Mar. 24, 2017
Local Time: 12 08 30

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 11/17/12     Best viewed at 800x600 or better