Home --> Software Design Issues --> Unicode --> review
Search  
 
Review of Microsoft applications
(with Unicode support for Indian languages)
Review of Microsoft applications supporting unicode for Indian scripts/languages.

  Unicode support for Indian languages/scripts is in principle available under Windows 2000/XP. Currently Notepad, Wordpad and Word2000 seem to have provided application level support and allow data entry and word processing in Devanagari and Tamil. Towards this Microsoft includes two open type fonts, Mangal and Lata for Devanagari and Tamil respectively.

  Data entry is based on the INSCRIPT keyboard layout standardized for ISCII. This keyboard mapping is uniform across the languages in respect of keystrokes for the basic vowels and consonants. With the INSCRIPT method it may not be possible to type in the full compliment of aksharas consistent with the conventions followed in the writing systems. This layout does not also have keys for some of the punctuation marks. There are no specific keys for typing in the zero width modifier characters. This will have to be accomplished only by typing in the decimal equivalent of the Unicode value while keeping the ALT key pressed.

  Among the applications in the Office 2000 suite, Word 2000 seems to implement text rendering using Uniscribe. Excel does not seem to go by the shaping engine.

  The extent to which data entry is supported consistent with the requirements of Unicode seems to vary across the applications. Find and replace boxes do not seem to support the entry of Unicode characters based on their decimal equivalents.

  Text rendering across applications is not consistent and is quite arbitrary. Word 2000 runs into problems in estimating the length of words and this causes unacceptable gaps between words. Editing is effected differently when you backspace or delete. Delete removes a whole syllable to the right while backspace deletes the last part of the syllable before the cursor.

  Cutting and pasting across applications results in many inconsistencies.

  There is very little support by way of linguistic processing. String matching in Word 2000 seems to match syllables but fails in the presence of some zero width modifiers.

  Text rendered in Devanagari departs from convention for many syllables which are written one below the other. This is not a serious problem for Hindi but alternate shapes as indicated are as per normal convention. We have used the IITM software to generate these forms and pasted them into the document.

  Microsoft's implementation of Uniscribe conforms to the recommendations in the Unicode book. However, a valid Unicode string in any Indian language need not contain linguistically meaningful information. Quite likely, algorithms which look for linguistic content in a Unicode string will get confused!

  The availability of Uniscribe to shape Unicode text does not guarantee anything in respect of linguistic processing of text. This is the responsibility of the application and each application must code into itself enough linguistic knowledge to effect any meaningful text processing. The multibyte representation for a syllable, coupled with the need to filter out characters which relate to rendering information can cause the applications to become really messy. In the illustration below, the same linguistic content is displayed in twelve different ways, all legal in terms of Unicode representation. For an application to actually figure out that the strings convey the same linguistic information, very complex text processing will be required.

  Download a copy of the file aditya.txt (Unicode Text file). Keep the shift key pressed while clicking on the link to prevent your browser from displaying the contents! The file may then be opened under Wordpad.
Multilingual Computing- A view from SDL

Introduction
Viewpoint
Writing systems
Linguistic requirements
Dealing with Text
Computing requirements (for India)


Unicode for Indian Languages

The conceptual basis for Unicode

Unicode for Indian scripts
Data entry
Issues in rendering Unicode
Using a shaping engine
Discussion on sorting
Open type fonts


Unicode support in Microsoft applications

Uniscribe
Limitations of Uniscribe

A review of some MS applications supporting Unicode



Recommendations for Developers of Indian language Applications

Using True type fonts to render Unicode Text

Can we simplify handling Unicode text?

Guidelines for development under Linux


Summary of SDL's observations

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Oct. 18, 2018
Local Time: 15 05 27

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on     Best viewed at 800x600 or better