The Multilingual Editor
of the Multilingual editor is to allow easy preparation of text in all
the Indian languages so that many different applications can utilize the
text. An important aspect of the text prepared using the editor is the
representation of the text in a form suited for easy and effective linguistic
processing. The Editor supports a uniform user interface across all the
languages/scripts and allows a number of flexible data entry schemes.
The Editor package
also includes utilities to convert the representation into formats compatible
with other applications. Text prepared using the editor could be taken
to Word (or other similar applications) and very high quality printed documents
could be obtained. The main idea behind the design of the Editor
is the concept of "One program for all of India". The program has achieved
this distinction by supporting Urdu as well, which is included in the list
of national languages.
The version of the
Editor described here is meant for use on Microsoft Windows based systems.
version for Linux includes the same features and is discussed in a
Features of the Editor
Text preparation using
the recommended data entry methods may be mastered in just a few hours.
Four different data entry schemes are available for all the scripts except
In addition to the above,
a data entry scheme recommended and standardized for Tamil (during the
Tamilnet99 conference), is also supported.
seen below is
the phonetic mapping scheme standardized at IIT Madras. The script used
in the illustration is Devanagari. The mapping accommodates about 58 basic
vowels and consonants across eleven languages. The mapping shown covers
aksharas from all the languages.
Text files of
large sizes can be handled by the Editor, typically upto 20,000 lines or
more in any of the scripts.
selection of the script
The Editor is
truly multilingual and allows free mixing of all the scripts even on a
single line. English letters (i.e., text in English) can always be typed
in along with Indian scripts. See the illustration at the beginning where
the selection of languages is shown.
in one script may be immediately converted to another dynamically. In the
screen shot shown below, the first line entered in Devanagari has been
duplicated using the copy feature and each line dynamically changed to
a script of choice. Transliteration is based on the phonetic nature of
the languages of India and the Editor permits correct transliteration of
Aksharas across all the scripts, using phonetically equivalent aksharas.
Thus aksharas not present in a language may also be shown using phonetic
equivalents for them. In the screen image below, see how Devanagari is
transliterated into Gurmukhi and Malayalam. It is quite possible
that modern Gurmukhi may not show the conjunct in the form shown. The fourth
line is in Sinhalese and the same has been transliterated into Devanagari
in the fifth line.
Paste into other applications.
The text prepared
using the Editor may be pasted into applications such as Microsoft Word,
Wordpad, Instant Messenger, Outlook Express and many others. In essence
the IITM Editor allows many Windows applications to be enabled with all
Indian languages. One need not therefore, look for Word in Indian Languages
with its limited features in handling the Indian scripts. Seen below
are examples of cut and paste. In one case, the text from the editor is
copied into Word, where it can be formatted further. A more interesting
application is seen where the text from the editor is copied on to the
composer window of Outlook express. Email in Indian languages is just a
clicl away from the Editor.
The Editor supports
Find/Replace strings in local languages also as the screen image given
below illustrates. The keystrokes are echoed in Roman and the text string
itself is displayed in a separate window. The language selection is to
allow strings to be entered in specific languages.
for more than 10000 aksharas across all the Indian scripts.
The Editor allows data
entry correctly for many many conjuncts (Samyuktaksharas) across the different
languages. Approximately 800 conjuncts are recognized by the editor and
each one of these may combine with one of upto 16 vowels to yield the above
number. The data entry scheme also permits new conjuncts to be typed in
consistent with the rules for the writing system for the scripts.
script, upto 13 punctuation marks and 10 numerals (in their respective
scripts) are supported. Traditionally Indian scripts have used few punctuation
marks, if any. However current requirements for publishing text in Indian
languages presuppose the availability of most of the Roman punctuation
Data entry allows
for typing in Vedic accent marks in Devanagari and the Grantha scripts.
Samavedic accent marks are also supported for Grantha, the script used
in South India for writing Sanskrit.
Hebrew and other Semitic languages/scripts which are written from right
to left, are supported in the right to left version of the Multilingual
Editor. There are two versions for the right to left Editor as well.
In the first, text generated conforms to the correct sorting order (alphabetical
order) for the native Semitic script. In the second, the text conforms
to the sorting order of Indian aksharas. Thus the first version renders
linguistic processing in Urdu, Arabic etc., very easy.
Details of the Arabic, Urdu editor
are presented in a separate page.
The design of
the Editor allows new scripts to be introduced without difficulty. The
basic principle of the design rests on the concept of the Akshara and the
internal representation is the equivalent of the akshara (i.e., a sound).
Hence the display of the akshara can be effected in any script through
look up tables. The Multilingual Editor will also accommodate new
fonts for any script. the tools for introducing new fonts are included
in the IITM package. However, the wide variations seen in the fonts designed
for Indian scripts makes it virtually impossible to guarantee that all
the aksharas will be properly rendered. The set of fonts recommended by
IIT Madras fulfill the requirements for correct rendering of all the aksharas
in all the languages. The Editor package includes these fonts. For an interesting
discussion on the vagaries of fonts for Indian
scripts please visit the corresponding page.
Versions for Tamil
There are two specially
designed versions of the Editor which conform to the data entry standards
recommended during the Tamilnet99 conference held at Chennai. The first
conforms to the phonetic standard and the second allows data entry based
on a standard manual Tamil typewriter Keyboard. Details
selection of scripts
set of Aksharas
versions of the Editor for Tamil
and installation instructions are given in the readme file linked below.
(To be read first)
HTML document describing
the use of the Editor. Also deals with aspects of Indian languages and
on data entry schemes in different versions of the Editor
IBM PC compatibles running
Microsoft Windows (98/Me/2000/XP) or Linux
16MB Main memory.
About 5 MB of hard disk space
(for good performance 32MB
of main memory will help)
An SVGA graphics card with
a resolution of 800x600 or better.