Home -->  IITM Software -->  PERL Modules
Programming PERL with Indian Languages
Introduction (and a bit of history)
  In the context of computing with Indian languages, people have often asked "can you not allow computer programs to be written in Indian languages"? Supposedly, the idea is to allow the development of applications in Indian languages. Nearly two decades ago, it was reported that Sanskrit is the most appropriate language for writing computer software. The report, quoted as having been published in the Forbes magazine, seems to have caught the attention of scholars all over India. Unfortunately, no one seems to have asked if this were feasible! Worse still, Forbes Magazine never carried such a report. 

  The topic of writing computer programs with Sanskrit merits separate discussion and will not be attempted here but certainly there is the possibility of allowing statements written in Indian scripts to be interpreted by a suitable computer program resulting in some meaningful response. In simple cases, this may be a natural language query composed in Tamil, Bengali or even Sanskrit, relating to a data base. Yet, a proper programming language based on the Sanskrit language, beset with problems of text representation and linguistic processing, may only be a distant reality

  Systems Development Laboratory, IIT Madras. which has contributed to significant development in Multilingual computing (and hence a meaningful IT solution for India), did think of applications which one could write in a regional language, along the lines of programing with an interpreted language such as BASIC. In fact, as far back as 1989, an equivalent of BASIC was shown to be viable in Tamil and Telugu. The package consisted of an editor and an interpreter which would execute the statements prepared using the editor (a form of Integrated environment). The lab realized however that large systems could not be easily developed using this approach and concentrated on other applications that people in India could readily use.

Current Thinking

  Subsequent to the development of a system for efficient string processing with Indian language texts, the lab had proposed the idea of "scripting" with Indian languages rather than a programming language itself. The lab feels that the concept of a "C in Tamil", "PASCAL in Bengali" etc., will make little sense in the context of application development across different platforms and different regional languages. Any IT solution for India should allow a uniform approach to dealing with all the languages. Concepts such as mentioned above will be useful only as exercises for college students who may want to learn to write a simple compiler or a system program. One cannot produce a Tamil or Telugu based programming language that will allow the kind of complex development needed for supporting user interfaces in Indian languages.

  The view held by the developers at the lab is that the best approach to computing with Indian languages is to develop all the software at the level of applications, using standard development tools so that they may run on different systems. The idea of an operating system in an Indian language is just not meaningful since what we require are applications which the people can run in their mother tongue. It will be years before people have the right set of tools to work with Indian languages even if we concede that just as Chinese and Japanese versions of systems, we too can have ours. 

  It is important to remember that the primary task faced by educators in India is getting people to read and write in their languages and not turning out programming professionals. If a programming environment is indeed warranted, it could be provided through "scripting languages" which support useful abstractions for accomplishing most tasks dealing with numbers or text based data. The idea of scripting is not new. One sees fine examples of scripts in the Unix environment which accomplish many interesting tasks. The Macintosh programming environment includes scripting of communication programs as well.

   Scripts were traditionally easy to write but slow in execution. PERL has changed all that today. PERL has developed into an important programming resource for efficient text processing, not only for web based applications but also for standalone applications on a variety of systems. PERL is a remarkably good choice for writing applications which would interpret scripts written in Indian languages. Very little is required by way of enhancements to standard PERL which handles regular expressions with great ease and simplicity.

  The fixed size two byte encoding used in the IITM software lends itself to direct manipulations using PERL. The enhancement required in PERL for this is a simple module which can present "llf" characters as equivalent ASCII strings. Such a module has been developed in the lab and is known as "llperl". This module provides support for processing text prepared with the IITM multilingual editor. The idea behind this approach is to permit PERL programs to be written using the IITM editor where text strings in Indian languages could be present.

  Those intrigued by what is being said will find that it is indeed very easy to do this. Given below is a simple PERL program and it does not take much time to recognize that this is an echoing program which prints on the console, the text string input by the user.

  One would invoke this program (which could be given a name like llecho_pl.llf) as
$ lperl llecho_pl.llf
   where lperl is a preprocessing utility which would convert the .llf file into an ASCII file conforming to the requirements of a standard PERL program, and invoke PERL to handle the converted file.  A typical use of the utility may show up like this on the console screen.
     While invoking the lperl utility, the command shell used is not to be taken as the standard Unix shell, for it would not allow data entry in Indian scripts. A special Indian language based command shell has been used here under Linux.  This shell allows interaction with the user in different languages (one language at a time) and can display text in all the scripts. Using the command shell, one can invoke other lperl programs such as a sorting program or email client and retain on the screen a totally regional language based interaction.

The enhancements to PERL have been incorporated in version 5.0 of PERL for the Linux platform. A suitable command shell supporting Indian language based interaction is also provided.

The llperl module is available for the Linux platform

Sample PERL programs may be seen in the linked page

PERL under MSWindows

Though PERL has been ported to the Win9x/2000/XP platforms, there are some ticklish problems to deal with in respect of shared memory implementation. Shared memory is required to allow a local language based command shell to work with PERL. This is being examined and quite likely a local language based command shell that can share data with PERL modules will be available as part of the IITM software at a later date.


Acharya Logo
Himalayan peaks as they reflect the golden rays of the early morning sun.

Today is Jan. 21, 2020
Local Time: 23 32 46

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 11/07/12    Best viewed at 800x600 or better