History of the IITM project

Home --> History

History of the IITM Project
It would be of interest to know something about the Systems Development Laboratory in the department of Computer Science and Engineering at IIT Madras. This lab has distinguished itself as a unique student managed laboratory in the Institute. Started around the time the 8080 cpu was introduced along with a small development kit known as the SDK80, the lab had been christened as the hardware lab of the department! Early projects in the lab centered around building small systems for educational use, specifically to allow students to learn about the underlying principles of computer systems. Though IIT Madras had at that time one of the state of the art machines (370/155), one could never look at the machine inside, much less put an oscilloscope probe inside to look at wave forms! One of the earliest projects in the lab was the design and implementation of a digital module for drawing Bezier curves (please see the image in the acknowledgment page), the idea being the possibility of generating characters of Indian scripts.
It took a few years before bit slice processors were available in the country. The earlier 8080 based hardware was modified to work with the AMD2900 series and a simple system was demonstrated in 1982. The paper "An approach to character generation using cubic splines" presented at the IEEE Consumer Electronics conference in 1983 won the second place outstanding paper award and paved the way for deeper interest in the students to work towards systems for displaying Indian scripts.
The availability of the PC in India made a big difference with the possibility of running the curve generation algorithm in software, almost at the same speed that the AMD2900 was providing. The students of the lab had by now mastered the art of drawing curves and generating characters but the lab had virtually no PC since the systems being built in the lab were 68000 based Unix machines with graphical support. This was the period known for the six versus the eight fight (the superiority of the 68000 CPU over the 8086!). The earliest of the PCs which were procured for the lab were XTs and in 1988, the first attempt at computing with Indian scripts was made by designing and implementing an interpreter for a Basic like language written in Tamil or Telugu. The characters would not be displayed through fonts but drawn on the screen using curves. Independence from the vagaries of fonts was of great importance to the students who could use a simple incremental algorithm to stroke the characters independent of the script. At the same time, the need to represent syllables rather than shapes was recognized and the system used a sixteen bit code for the syllables of Tamil and Telugu. .
The approach taken was not unlike that of the Metafont approach suggested by Prof.Knuth but the rendering was done in IIT's own way. TeX was a favourite with many academically motivated developers but unfortunately interaction was out of question. In the IIT experiment, the representation of characters using sixteen bits was done in a manner which made it possible to quickly identify the strokes needed to generate the character. As many as four different shapes, each made up of up to 16 curves, could combine together to generate a composite shape for a syllable. The result was that one could get the system to provide an interactive user interface so that computing with Indian scripts could be made possible. Though a simple Basic like interpreter (written using Turbo-C) was demonstrated, the real need was not a programming environment but one in which applications would be available to users for data preparation and processing.
It was in the light of this requirement, the present project was conceived. The very first application developed using the curve drawing approach to displaying text, along with the syllable level internal representation was a screen editor under DOS which worked on a set of application calls to provide input and output functions to a user application. These functions were much like the getch() and putch() functions of C but allowed us to input strings in Indian scripts. These functions constituted what the students called the local language library. They had these basic functions named lgetch() and lputch() to indicate the local approach to input and output. Using the library of functions, one could write a variety of applications, which would work uniformly across the languages of India. In 1993, a gopher client was built and it established the feasibility of developing many useful applications supporting user interfaces in Indian languages by modifying the character processing routines of standard applications to work with two byte codes. The students also demonstrated a client to work with Oracle and allow queries to be effected in Indian scripts and the results displayed.
Around this time (1993), there were quite a few Operating Systems in use but DOS was yielding to Windows3.X and Unix. This gave the students an opportunity to port the system to several machines including the Macintosh. The lab was fortunate to gain the friendship of Prof. Frank Starmer of Duke University, who spent a sabbatical year at IIT Madras and Prof. Sankara Rao of the University of North Dakota who graciously agreed to provide space on their systems and allow the IITM software to be made available via ftp to others. This was a time when IIT Madras had just one 9600 baud line to the net catering just to email services.
By this time, over fifteen students had contributed to the development of the software and many new applications were being envisaged. During 1994-96, the library functions were standardized as was the global set of aksharas across the Indian languages and a line editor (led), a viewer (lb) and a printing utility were developed for as many as six platforms and these were subsequently distributed from the Duke web site. (http://taylor.mc.duke.edu/~krishnan/ )
Subsequently, the lab was fortunate to get a 486 machine and could host a small web server whose purpose was to serve on-line lessons to learn Sanskrit. The multilingual Editor "led" was enhanced with additional features and was used to prepare the text for the lessons. This service had won much appreciation from the user community.
The on-line Sanskrit lessons established something very important. That Indian language text could be easily displayed on the web without having to install special software was perhaps the most important observation. That the IITM software had the best features to perform linguistic processing as well, was another important observation. The lab's web server which was known as http://sdlcfsn.cs.iitm.ernet.in/, was named to honour Prof. Charles Frank Starmer who had done much to help the lab gain visibility on the net. Most persons who have come to know about the IITM software actually got the details from the Duke site, which later moved to the Medical University of South Carolina, along with Frank Starmer. For historical reasons, the early pages with characters drawn through curves were maintained at that time at
http://www.musc.edu/~krishnan
During the summer of 1997, Prof. Raj Reddy of CMU, who had visited IIT Madras, saw the development and immediately recognized the strength of the syllable level coding. He felt strongly that it was time for the lab to start working with fonts so that the standardization that was being effected on the net could be honoured by the IITM software. Though the lab was fully aware of the vagaries of fonts and specifically the chaotic situation in respect of Indian language fonts, the students were convinced that the software would indeed gain strength by providing output formats consistent with the support provided by the newer systems, specifically win95.
The newer versions of the editor and related software, which work well on Win95 systems actually utilize the full complement of glyphs supported in truetype fonts. However, for the text to be rendered properly on other systems, compatibility with ISO-8859-1 has been forced. Also it must be stated that the lab continued to recommend the syllable level coding for the aksharas though elsewhere in the world, Unicode was being recommended. Microsoft and other developers continue to provide support for Indian languages primarily though a language enabling process rather than a language localization process.

  Our stand on Unicode is reflected in the observation "Unicode for Indic Scripts requires the Application Programmer to understand how a syllable should be rendered. Application Programmers are thus expected to thoroughly comprehend the Orthography of the script for the language. Being a variable length code, Unicode is not easily amenable to linguistic text processing".
1997 also brought in an important development in the lab, that of synthesizing the sounds of the aksharas. The syllable level coding made it possible for the students to experiment with different synthesis schemes. Systems such as the Festival speech synthesis system or the Klatt synthesizer were initially used by the IITM software to directly go from akshara to sound but the rendering of the phonemes was not satisfactory, being limited to either American or British speech.

  The MBROLA system was an ideal choice and it would not be an exaggeration if it is said here that the very first continuous text to speech application in Indian languages was demonstrated in just three days after a version of MBROLA for win95 was downloaded. A bit of experimentation with the different data bases allowed the students to finalize the choice on the Swedish data base. Today (July 2001), there is indeed a Hindi data base for use with MBROLA but it is somewhat inadequate when it comes to generating conjuncts. The recently added Telugu Data Base also supports a very restricted set of diphones (August 2002).
Clearly the lab has to work with other groups to develop meaningful data bases for all the different Indian languages. The absence of proper recording resources has hampered this activity but hopefully we will be able to work with other groups.
Enhancing the Indian language applications with speech is very easily accomplished in the IITM software when new applications are developed. The speech enhanced multilingual editor was one of the first applications developed at the lab for the benefit of the visually handicapped persons in India. This application allows a visually handicapped person master data entry in Indian languages and thus prepare himself/herself for higher education and meaningful employment. At the same time, the syllable level coding matched the requirements of Bharati Braille and preparation of documents in Braille could be easily accomplished.
Senior citizens of Chennai, who had read about the software through newspaper articles, had proposed that volunteer groups be formed to promote the use of the software for the benefit of the disabled and underprivileged. Thus was born Vidya Vrikshah, the volunteer organization in Chennai, which now stands as a fine example of a group of volunteers who have actually demonstrated that IT does indeed hold much promise for literacy and education in the country, if approached through the mother tongue.

  The organization conducts monthly training programs to train visually handicapped persons in the use of computers. This program, given free of charge, has attracted several hundreds people from different parts of the country to come get trained in the use of computers. Recently, the group of experts from different organizations for the disabled, who attended the INTEND 2001 conference, wholeheartedly endorsed the use of the IITM software for large scale use within the country.

Here are some additional details relating to the project.
During the past fifteen years, approximately sixty students have contributed to the development of the software. Majority of the students were undergraduates who took up the work out of a conviction that something meaningful can be achieved. Though in many cases, the work related to their undergraduate project, their involvement was deeper since they spent more than one year in the lab, getting trained first before continuing the development.
The IITM software project has been unique in many respects within the IIT system. It is the first project of its kind run entirely by the students over a period of a decade. It is the very first project ever in the IIT system where a product directly usable by the people of the country has been designed, built and delivered to the people.
The project, by design, has not been funded by any Government or private organizations. IIT Madras is the only Institution among the IITs which has refrained from requesting for funds from the ministry for technology development in Indian languages. This has given the lab the freedom to make the software available free of charge to the people. It has also given the students an opportunity to work on socially relevant problems and provide workable solutions as opposed to developing prototypes which would require further work to make them usable.
It is very clear that commercialization of any product, especially software, would render it unreachable to the section of the community that should truly get access to it. The students of the lab are quite convinced about this. The free distribution of the software by IIT Madras cannot be likened to the free distribution of software developed at many academic institutions of the world. The purpose is to make available to someone the basic means for gaining literacy. It is a different question of course if people choose to ignore the software. We do hope that this will not happen.
A related issue is that the development at the lab has not been publicized or made known through academic channels. The only channel of information transfer has been the lab's web server which carries the on-line Sanskrit lessons. Also, the question of why the lab has not worked with other groups in the country needs to be addressed. The answer is easy to provide. It is not unusual for heavily funded academic projects to gain national visibility. The destiny of what results from a project is often decided by the funding agency and not the group which develops the technology. The question of working with other groups just did not arise because almost all of them run funded projects and would find it difficult to freely share the information. This is an important ideological difference that must be reckoned. This would explain the conspicuous absence of IIT Madras' name when IT in Indian languages gets discussed at a national level.
On a philosophical note, it is not the technology that matters. It is how the technology is actually used by the people that really counts. The IITM project has consciously addressed the latter issue, while most projects have concentrated on the former.
Resources for continuing the development were frequently added by students themselves during their visit to the lab. In many cases, the students of the lab who had pursued graduate study in the U.S would bring back books, peripherals and other development software and thus bless the continued development of the project. This is in contrast with the normal IIT approach to resource generation through sponsored research. The end result is extremely satisfying in the approach taken by the lab, where a socially relevant problem receives attention and a solution as well.

How we have progressed over the years

1979
A TTL logic based controller generates Bezier curves for display on an Oscilloscope.

1983
AMD bit slice CPUs used for building the curve generator.

1989
Characters drawn on a PC and sixteen bit codes proposed

1991
A Screen Editor developed for different platforms, Sixteen bit codes are used. A C-callable library is made available.

1993
Client Server applications demonstrated. Email and Gopher clients made possible.

1994
C callable library refined and ported to Win3.x, SunOS and X-Windows under Linux.

1995-1996
Applications developed for text processing and search engines built. Prof. Frank Starmer graciously offers space on his machine at Duke to host web pages for the IITM Software

1997
Lab's Web site is setup.
http://sdlcfsn.cs.iitm.ernet.in/
On line Sanskrit Lessons put up at the web site.

Volunteer Organization Vidya Vrikshah uses the IITM Software to prepare texts of scriptures.

1998
First version of the Fonts based Editor developed for Win95. Uses MFC.

1999
Text to Speech demonstrated as well as Braille output from Indian language documents.

Vidya Vrikshah starts training programs for the Blind using the Multilingual editor with speech output.

2000
Speech output included into other applications.

2001
Lab's web site renamed
acharya.iitm.ac.in

INTEND2001 conference recommends IITM Software as a National Solution.

2002
The Multilingual Systems project gains international recognition through its nomination for the Stockholm Challenge 2002 award. We gratefully acknowledge this honour.

The version of the Editor under Linux gets ready and is distributed from the Acharya server.

2003
The Multilingual software is selected for a national award given by the Ministry of Social welfare and Empowerment, Government of India.

Prof. Kalyana Krishnan receives the award from the President Dr. Kalaam.

2004
IIT Madras and Vidya Vriksha together initiate two national level projects.

1. National Initiative for the Blind (NIB)

2. The Vikas project (Village Information Knowledge And Skills)

2005
IITM Software is put up as a project under Open Source at
imli.sourceforge.net

Please visit the credits and Acknowledgments page for the names of students who contributed to the project.

A set of photographs over the years will surely bring smiles!

Acharya Logo
White lilies.

Today is Jul. 04, 2026
Local Time: 19 50 13

| Home |

Last updated on 08/14/20 Best viewed at 800x600 or better