image
image
image
image
image
image
image
 
Home -->  Software Design Issues --> Transliteration Schemes
Search  
 
Transliteration schemes
  During the past several years, different methods have been introduced to prepare Indian language documents by entering the text through specific transliteration schemes. Data entry through transliteration is quite close to phonetic mapping of Indian language characters to the letters of the Roman alphabet.

 Notable among these methods are, 

The ITRANS package developed by Avinash Chopde which makes use of an approach to printing documents through LATEX. ITRANS now has the support for several Indian languages but the transliteration scheme is not uniform for all Indian languages. However it is a highly recommended package for printing documents. Substantial number of Sanskrit and Hindi documents  have been prepared using ITRANS.

The RIT  package developed by RamaRao Kanneganti and Ananda Kishore to prepare Telugu text. RIT is not unlike ITRANS but offers greater flexibility during data entry. RIT also relies on LATEX to get an output. A large number of Telugu documents (Poetry and Texts) are available in RIT format. 

The ADAMI and ADHAWIN packages of Dr. Srinivasan are exclusively for Tamil and are based on the principle of using TrueType fonts to view documents under Windows or the Mac. They constituted one of the earliest approaches (1995) to dealing with Tamil text on a PC. Data entry in Tamil uses a special transliteration scheme different from ITRANS. The software uses a specific internal code for the Tamil characters (8 bit representation) which permits direct viewing on the screen using TrueType fonts specially designed for this purpose. The Adhawin package is quite popular among the net Tamil community. A good number of Tamil documents have already been prepared using Adhawin. As of 2000, the schemes have yielded to other methods for dealing with Tamil text on a system

The MYLAI fonts software developed by Dr. Kalyana Sundaram of the Swiss Federal Institute of Technology, is again for use with TAMIL. Data entry for this scheme is based on Transliterated Tamil using Roman letters but the transliteration scheme is different from that of Adhawin. Mylai fonts are supported under windows, Mac and Unix and an account of the transliteration scheme used, both MYLAI and Adhawin permit Tamil text to be included within web pages. 

A comparison of the above schemes is provided in a separate page



Other Schemes 

  A review of the archives of Indian language documents on the net reveals several other schemes of Transliteration and fonts. The Indology site in England has electronic texts of Sanskrit Documents prepared in CSX format, a special input method recommended in 1990 for Sanskrit data entry using a Dos feature called Code page switching. ITRANS which is more recent offers conversion facilities to convert from CSX to the ITRANS format. 

  The Tamil archives of the Institute of Indology and Tamil Studies in Germany (IITS) has an archive of texts of Tamil Sangam literature and many Sanskrit documents. These archives are based on the transliteration scheme recommended by the University of Madras, a fairly well known and accepted standard. This scheme is somewhat different from the other transliteration schemes used for Tamil. 

  The Mahabharata and Ramayana texts have been prepared by Prof. Muneo Tokunaga  of Kyoto university in Japan and the massive amount of effort that has gone into preparing the archives deserves special mention as also the dedication and patience with which the project was undertaken and completed. 

  A  list of resources for documents in Indian languages is provided by the linked site in Helsinki, Finland.

  Apart from these, several schemes (and software) have been introduced to permit display of Indian language text on the web. Many of these schemes are contributions from groups interested in web sites for different magazines. 

Transliteration schemes and Fonts 

  Since transliteration schemes have become popular, it may be useful to discuss some technical issues relating to the methods used to prepare html documents with embedded Indian language text. The following paragraphs provide insight. 

  Ultimately, the display of characters on a screen or printer requires what is known as a rendering program which generates the shape of the character from the encoding used to represent the character. Typically this is accomplished through FONTS and web browsers have excellent capabilities to handle different types of fonts. HTML documents may contain text in specific languages by specifying the font to be used while displaying the text. This was by and large the method used to include text in other languages within a HTML document. It must be noted that the browser viewing the document showed be able to load the required FONT locally. Only then the text is correctly displayed. In the absence of the required fonts, the browser will use some default and the text will not intelligible at all. 

  It has been argued (with enough justification) that displaying multilingual text using the above approach is not the right way to deal with multiple scripts in a single document. It turns out that applications should be aware of the character set used to represent the text in a document. The above approach takes care of only the display and linguistic processing will be severely hampered. In respect of Indian languages, the method using a specific font has somehow remained in use in spite of variations observed in different fonts, even for a given language. The situation has changed somewhat after Unicode support was included and today, Indian language text can be handled through Unicode though there are some serious linguistic issues to be answered..

  Transliteration schemes employed while entering Indian language Texts may have no connection with FONTS at all. That is, the transliteration mechanism is only a means to identifying what letters (vowels, conjuncts, consonants etc.) are present in the text that should be displayed in the specified Indian language. Hence the transliterated text is converted into the required encoding of the characters to be shown and this encoding is specific to the FONT chosen to display the text. 

  Packages like ITRANS rely on METAFONT which is basically a program to generate the display based on descriptions of the shape of the character. Packages like Adhawin use FONTS which were designed for handling Tamil. In these, the transliteration directly relates to the placement of the Glyph (shape of the Tamil letter) within the ADHAWIN font which must be used by the package. Data entry thus becomes font specific and the text prepared using the package will not be rendered correctly if some other font is used for the display.

  Preparing web pages (html documents) which include Indian language text requires the use of word processors which support FONTS. Even with the support, additional factors must be taken into account while entering text. Due to the fairly complex nature of the Indian scripts, data entry is quite cumbersome with most word processors. A point to remember here is that word processors are not yet universal enough to run on all platforms. At least in the context of Indian language text entry, a standardized editor is very much required. 
 

Older schemes

 The idea of transliteration is not new. For more than a century, printed books used a suitable transliteration scheme with Roman letters and diacritics to display text in Indian scripts.

  Early attempts at using transliteration with computers started with TeX, where ASCII letters were used to specify the aksharas of Indian languages. TeX has the potential to display a nearly complete set of aksharas on account of its ability to deal with fonts which could in principle have as many as 250 Glyphs.

____________________

CSX 
(Classical Sanskrit Extended)

  CSX was introduced around 1990 during the World Sanskrit Conference. It was not strictly a transliteration scheme but a set of codes to identify the basic aksharas of Indian scripts. These codes went beyond the normal ASCII range of 32-127 for displayable letters. 

  Computer programs would process these codes for linguistic applications. Some CSX fonts were created and could be used on systems which rendered both ASCII and Upper ASCII codes.

____________________

TeX

  Frans Velthuis went one step further and gave a scheme of transliteration using only displayable ASCII and this allowed Devanagari text to be viewed in transliterated form on standard displays on many different computers. The processing relied on TeX to produce quality outputs which could be viewed with PostScript. 

  The scheme allowed easy preparation of Devanagari text using the ASCII keyboard and high quality printouts could be produced using Velthuis' specially written preprocessor. 

____________________

Harvard Kyoto Scheme

  This scheme was used by scholars from Japan who prepared electronic version of the important texts in Sanskrit viz., Ramayana and Mahabharata.

  The URL below provides the link to a table which compares many different transliteration schemes in vogue at the time of preparing the texts.

Table of Transliteration schemes

(http://texa.human.is.tohoku.ac.jp/)

____________________

Schemes for Tamil

  At least six or seven different schemes have been in use for Tamil. Some of the schemes were platform specific and required the use of special fonts. A comparison of these schemes is provided at the web site maintained by Dr.Kalyana Sundaram.
 
 
 
 

 

Acharya Logo
A beautiful view of the hillside in the morning mist. The scene is in the Himalayas.

Today is Sep. 24, 2017
Local Time: 18 07 56

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/26/12     Best viewed at 800x600 or better