Application areas for the IITM Software.
    The primary aim of the project taken up at the Systems Development Laboratory has been to develop a system of computer programs which permit a uniform and language independent approach to designing computer applications supporting user interfaces in all the Indian languages. 

    What this means is that a set of software tools is available to users for developing new applications supporting interaction with the computer in one's  own mother tongue. The IITM Software is therefore quite different from software such as Editors, Word processors, data bases etc. which may permit data entry and display in Indian Languages. We believe that there are many applications peculiar to India which cannot be handled satisfactorily by adapting existing software which support input and display in Indian scripts. Text processing with Indian language text requires an approach to dealing with text consistent with the writing systems employed for different languages.

    An important point in respect of the IIT Madras software is that it includes tools for processing multilingual text. With these tools one would be able to perform very effective string processing on the text and thus cater to applications in linguistics, lexical analysis, parsing etc. The ability to support Roman text along with all the Indian scripts makes the software specially attractive for bilingual applications involving an Indian language and English. 

    Given below are some applications which may be readily handled by the software. Please note that the applications will support user interfaces in all the following scripts.

  Devanagari (for Sanskrit, Hindi and Marathi), Gujarati, Gurmukhi, Bengali, Oriya, Telugu, Kannada, Tamil, Malayalam and Roman Diacritics. 

  Support for Urdu is included as also other scripts written right to left such as Arabic, Hebrew and Avestan.

  A few scripts from the Southeast Asian region are also supported on account of the syllabic writing system used.  Sinhala, Bali, Burmese, Tibetan are some examples. 

  The editor also allows creation of text in Bharati Braille, the standard followed in India.


Document preparation

Automatic transliteration

Generating indexes and 

Roman Transliteration

Linguistic applications

Educational Aids

Multilingual clients

Manuscript preservation

Vedic Texts

Software for the Visually Handicapped

Braille in Indian languages

Setting up Web Pages

Email and Internet applications

Working with PERL

Multilingual Document preparation
    The IIT Madras editor program may be readily used to prepare text in all the Indian scripts. Text prepared using the editor may be imported into other applications such as word processors (e.g., Microsoft Word). Also the text may be quickly converted to the html (as well as PostScript, PDF ) format for display using standard web browsers.  Seen below are screen shots of the early version of the multilingual editor which uses curves to draw the characters on the screen and the recent version of the same which uses fonts.


  The editor supports a very rich set of aksharas including many not covered by standard coding schemes such as Unicode or ISCII. A  point to keep in mind is that almost any desired representation for an akshara can be provided dynamically through externally specified parameters. Printouts of high quality may be produced via postscript or through the  word processors into which the text is imported. Data entry is natural and uniform across all Indian languages.

Multilingual editor

The Multilingual Editor from IIT Madras is a versatile application that supports a uniform user interface for data entry in all the Indian scripts. Scripts from South Asian Regions are also accommodated. As of July 2002, a special version of the editor handles Urdu, Arabic and other scripts written right to left.

The multilingual editor can be downloaded free of cost from this site.

Features supported in the editor

Editor Download page

Automatic transliteration across all Indian languages
     Applications which require the same text to be displayed in two or more languages simultaneously may be easily handled with the editor. Data entry need be effected for just one script and the text may be reproduced in other scripts automatically.  In the example below the first couplet was entered in Devanagari using the multilingual editor. All the others were automatically transliterated and added. 

  The automatic transliteration feature will be useful for preparing books which deal with the scriptures where knowledge of a particular script may not be required to understand the content.

   Incidentally, this a couplet in Arthashastra that specifies the physical proportions which must be adhered to in erecting a pillar. Some food for thought for Civil Engineers!

Transliteration with the IITM software results in the most appropriate representation in the script for sounds that form the basis of the set of Aksharas supported in the software.  Sounds present in one language but not in the other are represented in the second case through shapes that are accepted as close equivalents.
Generating indexes and concordances for words
   The IIT Madras software includes programs for indexing texts so that concordances may be generated for the words in the text. The index generated may also be sorted to yield meaningful word indexes for further study of the manuscripts/text. Seen below is a portion of the concordances generated for words in Tirukkural which consists of 1330 couplets. Each couplet consists of seven words and altogether there are about 6500 words in the text of Kural which are distinct.  Shown are the words in sorted order with the number of the "Adhikaram" and the number of the couplet itself. 
   It might appear that the sorting order is not maintained strictly in respect of the word seen on the last line. This is not a bug in the sorting algorithm. The space character in the last line has a code value higher than any Tamil akshara. Hence it falls after the other three words though one would expect it to be placed before the three. When text has to include letters which are not really part of the aksharas, it becomes necessary to treat them differently. In practice, it would be easy enough to handle the space before sorting the string. Likewise, the IITM coding scheme sorts the true consonant (i.e., a pure consonant without a vowel) by placing it after the vowel combinations. This is matter of choice. The algorithm could equally place it at the beginning. Visitors who have learnt Tamil will appreciate this aspect of ordering the aksharas.

   Visitors to these pages interested in seeing the full list of concordances are encouraged to see the appropriate section under "Tirukkural: an online reference" .

Sorting and Indexing utilities

(Proper sorting order for Indian scripts is maintained in the utilities)

Displaying Roman (ASCII) transliterated text
     For many years Roman transliteration had been used to represent text in Indian languages. The conversion utilities in the IIT Madras software may be effectively used in converting these texts into the form suitable for display in different scripts. For instance the line shown below may be converted to give the display in Devanagari which follows the line. This is very useful for scholars preparing manuscripts in Indian languages where they may not be very conversant with the scripts. They can type in the text in English and get it converted automatically.  The IIT Madras software includes a utility known as "tconvert" to  view ascii text prepared according to some transliteration scheme, in the desired Indian script. Several transliteration schemes are handled by the utility.
  Incidentally, the IITM Software will also permit text in English to be transliterated into Indian scripts. The words in the given text are converted into appropriate phonemes and displayed in Indian scripts.  Who will benefit from this? Apparently some of our politicians!
a utility to convert Roman transliterated text into local scripts

Online transliteration Service

(Use this service to get your own copy of text in different Scripts)

Linguistic Applications
    The string processing library may be effectively used to perform sophisticated string processing in Indian Language texts. Calculations involving word frequencies, number of occurrences of specific characters, conjuncts etc. may be done very easily. Lexical analysis, parsing of sentences may also be performed with substantial ease. 

   The system may be used to setup reference databases in the form of corpora which may be effectively used by other programs to study the structure of sentences in a language.

   Principles of Frequency Analysis  Frequency analysis  with text in Indian languages is aimed at identifying the most frequently occurring sounds rather than the letters of the alphabet. Since the writing systems specify rules for writing syllables, the analysis becomes intricate requiring careful identification of sounds (basically conjunct aksharas). The IITM software includes well designed utilities for performing such analyses.

  The page relating to Linguistics and Computation has details on the utilities provided by IIM for linguistic processing of text prepared using the Multilingual Editor.

Of special interest to linguistic experts will be the frequency analysis utilities for tabulating the frequency of occurrences of aksharas in a corpus specific to a language. Results of frequency analysis of text from Bhagavadgita, Kural and Tevaram are presented as examples.

Educational aids in Teaching languages and Science
   The multilingual capabilities of the system may be effectively used in teaching one language through another. Added to this, the ability to setup web pages makes the system specially attractive to designing computer based training material for use in schools and educational institutions. The link at the right takes you to a sample lesson on Trogonometry (Pythagoras Theorem) in Hindi . The on-line lessons made available at this site for learning Sanskrit stand as excellent examples of educational material prepared using the IITM multilingual software.

Science lessons

Learn Sanskrit

Development of Multilingual client applications
   Large scale resource sharing across computer systems has been rendered easy on account of the concept of client server applications. The fundamental principle behind a client server application is that the user interface to application is separated from the actual processing of the information. The IIT Madras software, with its library of string processing functions is well suited for developing applications which make use of Indian language user interfaces. Such applications find use with databases, searching though archives of information, on line references etc.. 

   Web based client applications are also easy to develop using software tools provided. Here is a good example of a Java based web interface to a search application which allows you to look at concordances in Tirukkural, one of the early works in Tamil  belonging to the Sangam period ( About 300 B.C.).

Indian Postal Codes

On-line reference with script display in Different languages. The codes are maintained in a mysql data base and can be queried to yield post office, district and state
information for a specified Pincode and vice versa.

Sanskrit Dictionary

Use this free online reference to the Monier Williams dictionary. This presentation is an excellent example of a search application in Indian languages.

Applications in the study and preservation of old manuscripts
    One of the important applications of the IIT Madras software relates to manuscript preservation. Rare palm-leaf manuscripts which are preserved (and should be preserved) are currently being transcribed manually. Many of these manuscripts were written in scripts which are no longer in use e.g., the scripts in copper plates belonging to the early Chola period in South India .

    The ability of the IIT Madras system to render a given text in different scripts renders this job easy. A good example of this is the Grantha script which was used in South India for many centuries to write Sanskrit text. The IIT Madras software is able to handle the script well. 

    Seen below is a manuscript followed by its rendering in the Grantha script. This process, which requires a one time data entry, allows indexing of the text as well. The image may not show the details of the writing as we have reduced the size  in generating the image. A few letters have also been omitted while transcribing. The example nevertheless shows the usefulness of the IITM software. You can see the same manuscript with much better clarity in the section on Palm-leaf manuscripts


Presenting manuscripts on the web

An exposition of the methods which may be used to display old manuscripts and provide search capabilities to a client application to locate specific manuscripts.

Preparation of Manuscripts and documents containing text with Vedic symbols or special notation
   The multilingual editor is well suited for preparing and displaying Devanagari text containing vedic notation. Four Vedic symbols are provided  - Anudatam, Swaritam, Dheerga Swaritam and Kampitam (Kampa). These marks occur above , below or both above and below an akshara. These symbols are usually found in printed texts of the Rig Veda and Yajur Veda.

    The symbols for Sama Veda are supported only in the Grantha script in the present version of the editor as there are far too many of them to be accommodated within the set of glyphs for Devanagari. The "Sanskrit 1.2" font recommended for use with the editor is the only Devanagari font catering to the Vedic symbols as of now. Below you will see a typical display of text with Vedic symbols.


  Shown below are the beginning lines of the portion of SamaVeda referred to as "Ouhagana". The text is shown in Grantha using fonts developed at IIT Madras.

Vedic notation
(A brief introduction)
Preparation of text with music notation for South Indian classical music
   The rich character set supported by the IIT Madras software permits music notation to be handled, in respect of South Indian (Carnatic music). The symbols included here correspond to the twelve different notes or swaras in five different octaves.  Also supported are symbols to specify Kampitam, Jaru and the duration of the note from the eighth of the interval to one fourth and half of the interval. Symbols live Tala marks are also supported. The music notation is based on the recommendation made by Dr. Sambamurthy in his book series on South Indian music. Currently, music notation is supported only for Tamil and the "iitmtam" truetype font.

   During the early part of the twentieth century, Subbarama Dikshitar, a member of the Muthuswamy Dikshitar lineage, had recommended a notation for Carnatic music as played on the veena.  It is not clear if there is any interest among musicologists of today to retain this notation. The development team at IIT Madras would be interested to hear from experts.

 The development team has not pursued this application seriously, for musicologists and musicians seem to have widely differing views on the subject. 
Software for the visually handicapped
   During the year 1998, the IIT Madras software has been enhanced to include support for use by the visually handicapped.  A special version of the Multilingual editor editor has been developed which features text to speech  output and appropriate audible responses for almost all the selections in the menu items. Characters are spoken as they are typed in and so are words and whole lines besides the full text of the document itself. A visually handicapped person will be able to use this editor meaningfully for quick and effective data entry both in the vernacular and in English. 

   Screen readers are also available for reading text in Indian languages. On account of the phonetic nature of the languages of the country, a single screen reader works well with all the languages, though as of now, intonation issues have not been fully taken care of.  Using the screen reader, a visually handicapped person will be able to hear the text in a .llf file which may include English words and sentences. Screen readers have been developed for the Microsoft Windows platforms as well as for Linux. The speech engine used is the freely available computer program known as MBROLA developed at an educational Institution in Belgium. Our page on on-line demos have a link to a file containing sample output speech in our languages. 

Here is a Video Clip of the speech enhanced editor in operation 

Sound enhanced multilingual editor
Multilingual data preparation application for use by the visually handicapped.

Jaws for Dos

Free screen reader application adapted to work with Windows and Linux

Sound enhanced Lynx

Text based web browser enhanced to support screen reading features

Braille output in Indian languages
   Another useful application of the IITM software is in generating Braille output in Bharati Braille, the adaptation of the six dot system for Indian languages.  Multilingual text prepared by the IITM editor may be instantly converted into Bharati Braille and embossed on a standard Braille embosser  either on-line or off-line. Hence lessons in Braille may be prepared and distributed in the form ready for embossing. 

   The multilingual editor with sound and Braille output facility has become very useful for the visually handicapped in the city of Chennai. There is much potential in this editor for providing learning resources for the disabled and the acceptance of the editor by the society has given us immense satisfaction. 

Image of editor page displaying braille

Braille for Indian languages, a simple introduction

Useful introduction to the Braille standard in India

Online services for the Visually Handicapped

Use these services to gain access to school and college text books prescribed for different classes and educational programs in major universities.

Specific educational institutions may also use these services to get documents printed in Braille.

Setting up Web pages (and web sites) catering to Indian Languages
  One of the most important applications for the IITM software relates to working with Indian scripts on the world wide web. The ability of the software to support data entry in a uniform manner across all the languages allows quick and effective means for setting up multilingual web pages. The host of utilities available with the software permit interactive web pages to be setup as well, through java applets. Utilities to present Indian language text in the form of Images or PDF files have far reaching consequences in respect of making Indian script viewable on the web without need for special software, fonts or viewers.

Tutorial on Setting up Web pages in Indian scripts
Email and Internet applications in Indian Languages
  Email is one stable application which has retained its simplicity and elegance, even as other internet applications have become sophisticated and complex. The development team responsible for the IITM software realized the importance of this application early enough to provide support for it. There are two ways of looking at email in Indian languages.

     1. Contents of the message are in Indian scripts but the user interface to the email application continues to be in English. This is by far the approach taken by majority of the email programs supporting email in Indian languages.

     2. The user interface to the email program is also in Indian languages besides allowing the contents to be typed in our scripts. This approach will help persons not knowing English to send and receive email messages in local languages. As of today (October 2005) few email applications provide this support.

   The support for email provided by the IITM software includes both the above. One might wonder however, as to how applications handling email in Indian languages can deal with email addresses which are specified in English. This is easily handled through "aliases" which can be setup with Indian language strings. The mail interface is a simple local language text based interface similar to the text based email client "mail" familiar to users. Seen below is a screen shot of the email client running under the internet explorer on Windows systems.

  Detailed information in respect of email in Indian languages is included in a separate section of these pages.

 Email in Indian languages

The approach to handling email in Indian languages is based on the use of appropriate fonts and rendering text in html format. Such text is easily displayed if the specified font is available in the system. Most email services in India which allow email in local scripts use this method. While this is useful, the user has to interact with the system in English only since the basic email application continues to be an English based one.

Perl Modules to work with Indian language text

  Perl has retained a unique position in the computing world as an ideal choice for text processing on the web. The text representation used in the IITM Software lends itself to easy string processing using the features of PERL. The IITM Software includes support for Indian language text in standard PERL programs through a specially designed module. With this module, web servers can use PERL scripts to process Indian language text and return useful results to client applications running from a web browser.

A separate section at this site is devoted to discussing PERL modules to work with Indian language text.

Sample PERL applications
