Home --> Software Design Issues --> Unicode --> reco
Search  
 
Unicode can be supported
(Recommendations for developers)
During the past decade (1991-2002), Systems Development Laboratory, IIT Madras has gained much insight in respect of computing with Indian languages. In fact, one of the applications developed in the lab relates to direct handling of multibyte codes during data entry in an interactive application. The multilingual editor has a special feature which permits users enter data in ITRANS. In effect this is equivalent to typing in the vowels and consonants but storing them in terms of syllables. The same approach can work with Unicode as well and a number of applications can indeed be developed on this basis.
    Developers can provide support for Unicode in all their applications both under Linux and Windows, if they agree to honour a few restrictions and also allow for two new Unicode values to be introduced as explained below.
     
  • Introduce an additional code for the halanth which can be interpreted as a null vowel. So a consonant followed by the null vowel can also be treated as a proper syllable. The current halanth code can be retained to indicate syllable formation with succeeding consonants. During data entry one would use the null vowel to explicitly form generic consonants. In effect, this new code would do the same job as that of a zero width non joiner but has the advantage that it can be interpreted as a syllable without having to worry about the next character.

  • Agree to restrict syllable formation to a limited set and avoid arbitrarily long syllables. Provide standardized API for developers to collect keystrokes through this API so that syllables can be formed in a consistent fashion.

  • Get rid of the codes for the Matras. You can use a regular vowel in place of the Matra since this does not in any way affect syllable formation. When you require a pure vowel inside a word, you can have a new Unicode value for what we may call a null consonant which can be typed in before the vowel. This null consonant simply implies that there is no consonant in the syllable and only a vowel which should be displayed in its pure form. The null consonant is the second new Unicode character we need to introduce.

  • Use only True type fonts  but make sure that the font rendering program will correctly handle zero width glyphs (this was the case with Win95/98/Me). Very high quality typography is indeed possible with True type fonts. Text prepared with fonts such as Sanskrit 1.2  or Sanskrit98 and printed under Win9X look so much better compared to what one can get with the Microsoft Open type font Mangal. These fonts are so well designed that they cater to a very large set of ligatures including Vedic accents, consonants with dots below and the very interesting bowed representation of a soft "ra" in Marathi.

  • If we agree to use a True type font, we can actually place the glyphs in the E000 region and include as many as 250 glyphs for a script to take care of intricate ligatures as well. (several years ago a special Metafont designed for Devanagari actually supported the generation of more than a thousand conjuncts as well as Vedic symbols with just about 240 glyphs!)


  • If we carefully design our True type fonts, we can create a multilingual font supporting all the important scripts (nine of them) and place the glyphs in the region E000-E9FF region, where each script will have close to 250 glyphs. We can include many common glyphs in this font including punctuation marks, special symbols and such which we could not manage in a regular True type font for want of glyphs. Comparable Open type font would require at least 650 glyphs per script and we can see that it will be difficult to manage such a huge font, let alone design one.

      True type fonts also have other advantages. the rendering process is not tied to the availability of a specific font so long as the glyphs are present at the expected location. We can prepare text and get it rendered in any font of our choice where the glyphs occupy the specified locations. With Open type fonts, unless Unicode input conforms to the assigned code values and not the glyph codes, the characters will not be rendered right.

Multilingual Computing- A view from SDL

Introduction
Viewpoint
Writing systems
Linguistic requirements
Dealing with Text
Computing requirements (for India)


Unicode for Indian Languages

The conceptual basis for Unicode

Unicode for Indian scripts
Data entry
Issues in rendering Unicode
Using a shaping engine
Discussion on sorting
Open type fonts


Unicode support in Microsoft applications

Uniscribe
Limitations of Uniscribe

A review of some MS applications supporting Unicode



Recommendations for Developers of Indian language Applications

Using True type fonts to render Unicode Text

Can we simplify handling Unicode text?

Guidelines for development under Linux


Summary of SDL's observations

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Apr. 23, 2018
Local Time: 11 35 45

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on     Best viewed at 800x600 or better