Home --> Software Design Issues --> Unicode --> viewpoint
Search  
 
Unicode: A Viewpoint from SDL

The Multilingual Systems project at IIT Madras was started around the time ISCII had evolved into a standard. It was clear to the development team that though ISCII was conceived as the basis for syllabic representation of text in Indian languages, one had to reckon the need to process a variable number of bytes to identify a proper syllable. The variable length code makes text processing very complex especially in the presence of codes which do not have linguistic significance but are required for correctly rendering the syllable.

In recent years, software developers have indeed given serious thought to supporting Unicode for Indian languages. Unicode for Indian languages has basically evolved from ISCII and has retained the essence of eight bit coding scheme though script specific codes have been assigned for the different scripts. World over, there has been a continuing debate about the real suitability of Unicode for applications in Indian languages but the open commitment given by Microsoft has led many developers to toe the line towards Unicode.

 From the very beginning, the Multilingual Systems project at IIT Madras had seen the futility of attempting to do text and linguistic processing with variable length codes for syllables and had therefore evolved a uniform two byte scheme to simplify text processing.

 The question of adhering to a meaningful standard where developers see distinct advantages is an important issue but a standard becomes meaningful only if most of what we have successfully attempted earlier can be accommodated. In this respect, Unicode for Indian languages does pose fairly serious challenges and to this date (March 2005) no satisfactory implementation of useful applications can be cited as examples.

 The purpose of this article is not to present an argument against using Unicode but to bring out the real difficulties in coping with its implementation for Indian languages.

Many of the complexities involved in rendering Unicode text through Uniscribe (Microsoft's shaping engine) or equivalent interfaces will be taken up one by one and the difficulties faced in linguistic processing will be explained. Where required, test files have been included for viewers to download and verify the points made.

  The information provided here will probably convince the reader that it is quite difficult to work with Unicode for Indian languages. Hence one should seriously consider alternatives for text processing. On the issue of using Unicode for transporting information across system, there is enough consensus however.


 
 
Multilingual Computing- A view from SDL

Introduction
Viewpoint
Writing systems
Linguistic requirements
Dealing with Text
Computing requirements (for India)


Unicode for Indian Languages

The conceptual basis for Unicode

Unicode for Indian scripts
Data entry
Issues in rendering Unicode
Using a shaping engine
Discussion on sorting
Open type fonts


Unicode support in Microsoft applications

Uniscribe
Limitations of Uniscribe

A review of some MS applications supporting Unicode



Recommendations for Developers of Indian language Applications

Using True type fonts to render Unicode Text

Can we simplify handling Unicode text?

Guidelines for development under Linux


Summary of SDL's observations

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Jan. 20, 2018
Local Time: 22 46 38

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on     Best viewed at 800x600 or better