Unicode can be supported
(Recommendations for developers)
During the past
decade (1991-2002), Systems Development Laboratory, IIT Madras has gained
much insight in respect of computing with Indian languages. In fact, one
of the applications developed in the lab relates to direct handling of
multibyte codes during data entry in an interactive application. The multilingual
editor has a special feature which permits users enter data in ITRANS.
In effect this is equivalent to typing in the vowels and consonants but
storing them in terms of syllables. The same approach can work with Unicode
as well and a number of applications can indeed be developed on this basis.
can provide support for Unicode in all their applications both under Linux
and Windows, if they agree to honour a few restrictions and also allow
for two new Unicode values to be introduced as explained below.
Introduce an additional code
for the halanth which can be interpreted as a null
vowel. So a consonant followed by the null vowel can also be
treated as a proper syllable. The current halanth code can be retained
to indicate syllable formation with succeeding consonants. During data
entry one would use the null vowel to explicitly form generic consonants.
In effect, this new code would do the same job as that of a zero width
non joiner but has the advantage that it can be interpreted as a syllable
without having to worry about the next character.
Agree to restrict syllable formation
to a limited set and avoid arbitrarily long syllables. Provide standardized
API for developers to collect keystrokes through this API so that syllables
can be formed in a consistent fashion.
Get rid of the codes for the
Matras. You can use a regular vowel in place of the Matra since this does
not in any way affect syllable formation. When you require a pure vowel
inside a word, you can have a new Unicode value for what we may call a
consonant which can be typed in before the vowel. This null
consonant simply implies that there is no consonant in the syllable and
only a vowel which should be displayed in its pure form. The null consonant
is the second new Unicode character we need to introduce.
Use only True type fonts
but make sure that the font rendering program will correctly handle zero
width glyphs (this was the case with Win95/98/Me). Very high quality typography
is indeed possible with True type fonts. Text prepared with fonts such
as Sanskrit 1.2 or Sanskrit98 and printed under Win9X look so much
better compared to what one can get with the Microsoft Open type font Mangal.
These fonts are so well designed that they cater to a very large set of
ligatures including Vedic accents, consonants with dots below and the very
interesting bowed representation of a soft "ra" in Marathi.
If we agree to use a True type
font, we can actually place the glyphs in the E000 region and include as
many as 250 glyphs for a script to take care of intricate ligatures as
well. (several years ago a special Metafont designed for Devanagari actually
supported the generation of more than a thousand conjuncts as well as Vedic
symbols with just about 240 glyphs!)
If we carefully design
our True type fonts, we can create a multilingual font supporting all the
important scripts (nine of them) and place the glyphs in the region E000-E9FF
region, where each script will have close to 250 glyphs. We can include
many common glyphs in this font including punctuation marks, special symbols
and such which we could not manage in a regular True type font for want
of glyphs. Comparable Open type font would require at least 650 glyphs
per script and we can see that it will be difficult to manage such a huge
font, let alone design one.
True type fonts also
have other advantages. the rendering process is not tied to the availability
of a specific font so long as the glyphs are present at the expected location.
We can prepare text and get it rendered in any font of our choice where
the glyphs occupy the specified locations. With Open type fonts, unless
Unicode input conforms to the assigned code values and not the glyph codes,
the characters will not be rendered right.
Multilingual Computing- A view from SDL
Unicode for Indian Languages
Unicode support in Microsoft applications
Recommendations for Developers of Indian language Applications