Unicode for Indian
Languages: A discussion
Support for Unicode
in applications catering to Indian languages is a highly debated issue.
Though Unicode has emerged as a viable standard and is finding increasing
use all over the world, there are some real difficulties in using it in
practice for building applications supporting multilingual user interfaces
in Indian languages. The conceptual basis for Unicode, though well accepted
for the western languages (scripts), does not fully conform to the linguistic
requirements seen in our languages.
At the Systems Development
Laboratory, IIT Madras, where some meaningful multilingual solutions consistent
with the linguistic requirements for all the Indian languages have been
developed and distributed as well, there is a strong feeling that Unicode
will not really help. It is true that Unicode is a world standard proposed
and accepted by a large community of academics, professionals and users.
Unfortunately, it does not really blend with the syllabic writing systems
used in india, much less provide the means to express linguistic content
without ambiguity and in a manner that ties in well with our own understanding
What we have tried to say
here reflects the above view.
A view from SDL
of the writing systems
with Text consistent with Linguistic requirements
computing requirements (for India)
conceptual basis for Unicode
for Indian languages/scripts
entry and associated problems
in rendering Unicode
a shaping engine to render Unicode text
on sorting or collation
conceptual basis of the Open type font
in Microsoft applications
the shaping engine
review of some Microsoft applications in respect of handling linguistic
for Developers of Indian language Applications
of True type fonts to render Unicode Text
we simplify handling Unicode text?
for development under Linux
Examples of Unicode
Rendering by different applications (Windows and Linux)
circa 2003 circa 2007
The experiences of the lab in working with Unicode are summarized in the
linked page. As of this update (June 2006), one has not seen an application in any of the Indian Languages that can be cited as a satisfactory implementation based on Unicode. Though a number of developers are counting on using Unicode, it is not going to be easy to effect Localization of our languages, consistent with the requirements of Computing with Indian Languages.
These pages were added to
the acharya web site during the period March-April 2003
The discussions deal with
We have tried to provide
as much information as possible to relate many different aspects of computing
in Indian languages with Unicode. Since the discussions relate to text
representation in terms of syllables, repetition of the basic principles
of syllabic writing systems discussed in the linked pages is unavoidable.
Each topic is in a way self contained.
Examples involving Microsoft
applications were generated under WinXP/2000 and MIcrosoft Office 2000.
It is certainly possible that the inconsistencies we have reported have
already been taken care of in the proposed (newer) versions of the software
available today (June 2005).
Text displayed in the vernacular
to illustrate specific linguistic issues was generated using the Multilingual
Editor developed in the lab. The text is actually sent to your browser
as an image generated on the fly, to allow more or less guaranteed viewing
of the text on any browser.
Special thanks to Sri.
Karthik Venkatesan, a friend of the lab who gave valuable suggestions
in organizing these pages.