Home -->  Software Design Issues --> Unicode
Unicode for Indian Languages: A discussion
  Support for Unicode in applications catering to Indian languages is a highly debated issue. Though Unicode has emerged as a viable standard and is finding increasing use all over the world, there are some real difficulties in using it in practice for building applications supporting multilingual user interfaces in Indian languages. The conceptual basis for Unicode, though well accepted for the western languages (scripts), does not fully conform to the linguistic requirements seen in our languages. 

  At the Systems Development Laboratory, IIT Madras, where some meaningful multilingual solutions consistent with the linguistic requirements for all the Indian languages have been developed and distributed as well, there is a strong feeling that Unicode will not really help. It is true that Unicode is a world standard proposed and accepted by a large community of academics, professionals and users. Unfortunately, it does not really blend with the syllabic writing systems used in india, much less provide the means to express linguistic content without ambiguity and in a manner that ties in well with our own understanding of languages.

What we have tried to say here reflects the above view.

Multilingual Computing: A view from SDL

Idiosyncrasies of the writing systems
Defining Linguistic requirements
Dealing with Text consistent with Linguistic requirements
Multilingual computing requirements (for India)

Unicode for Indian Languages

The conceptual basis for Unicode
Unicode for Indian languages/scripts
Data entry and associated problems
Issues in rendering Unicode
Using a shaping engine to render Unicode text
Discussion on sorting or collation
The conceptual basis of the Open type font

Unicode support in Microsoft applications

Uniscribe, the shaping engine
Limitations of Uniscribe
A review of some Microsoft applications in respect of  handling linguistic content

Recommendations for Developers of Indian language Applications

Use of True type fonts to render Unicode Text
Can we simplify handling Unicode text?
Guidelines for development under Linux

Examples of Unicode Rendering by different applications (Windows and Linux)

circa 2003    circa 2007

Summary of Observations

The experiences of the lab in working with Unicode are summarized in the linked page. As of this update (June 2006), one has not seen an application in any of the Indian Languages that can be cited as a satisfactory implementation based on Unicode. Though a number of developers are counting on using Unicode, it is not going to be easy to effect Localization of our languages, consistent with the requirements of Computing with Indian Languages.
These pages were added to the acharya web site during the period March-April 2003

The discussions deal with conceptual issues.

We have tried to provide as much information as possible to relate many different aspects of computing in Indian languages with Unicode. Since the discussions relate to text representation in terms of syllables, repetition of the basic principles of syllabic writing systems  discussed in the linked pages is unavoidable. Each topic is in a way self contained.

Examples involving Microsoft applications were generated under WinXP/2000 and MIcrosoft Office 2000. It is certainly possible that the inconsistencies we have reported have already been taken care of in the proposed (newer) versions of the software available today (June 2005).

Text displayed in the vernacular to illustrate specific linguistic issues was generated using the Multilingual Editor developed in the lab. The text is actually sent to your browser as an image generated on the fly, to allow more or less guaranteed viewing of the text on any browser.



Special thanks to Sri. Karthik Venkatesan, a friend of the lab who gave valuable suggestions in organizing these pages.


Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Apr. 05, 2020
Local Time: 01 55 22

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/26/12     Best viewed at 800x600 or better