Home --> Software Design Issues --> Unicode --> introview
Search  
 
Unicode- A Brief Introduction
Introduction 

  In the context of internationalization and providing uniformity in the handling of text based information across the languages of the world, Unicode has gained considerable importance. The fundamental concept behind Unicode is that text (Unicode based text) representation retains the linguistic content that must be conveyed while at the same time provide for this content to be displayed in human readable form. By catering to both these requirements, Unicode has emerged as the best choice for representing text in a computer application, specifically one that deals with multilingual content. Developers across the world are committing themselves to providing Unicode support in all their applications.

  Multilingual information processing is one of the essential requirements when it comes to computerization in India. Here, the development of applications requires that interactive user interfaces in different regional languages must be part of each application. A specific regional language may be supported through one or more scripts despite the fact that a given script may be used for more than one language,

  A very important issue, from a conceptual angle at least, is whether support for a script is equivalent to supporting a language? During the initial phases of development of applications in Indian languages, one was concerned more with the rendering aspects of text, a formidable problem in itself on account of the syllabic writing system followed for all the Indian languages. No one really felt compelled to take into consideration text processing issues. Majority of the early applications required text entry and display with computation effected on numbers rather than text per se. It is not surprising therefore that whatever standardization was attempted, did emphasize mostly the aspects of the writing system without really catering to the linguistic requirements.

  In essence, the standardization mentioned above (ISCII and Unicode) requires context dependent text processing of each character as opposed to simple handling of a character by itself. In western scripts, the writing system employs a relatively small set of shapes and symbols as this is sufficient to satisfy the requirement that linguistic content as well as rendering information be exactly specified through the same set of codes. Consequently, text processing could be comfortably achieved using a small set of codes.

  In respect of our languages, the complexities of the writing systems demand that a large number of written shapes (typically in thousands) be used though the linguistic content may still be specified using a small set of codes for the vowels and consonants (typically less than a hundred). Hence it is not possible to use the same set of codes to satisfy both the requirements. In their wisdom, the designers of ISCII and subsequently Unicode, essentially struck a compromise where the smaller set of codes was  recommended. Yet, they yielded to the temptation of incorporating codes to include rendering information as well. These codes conveying rendering information took care of Devanagari derived writing systems but do not adequately address the writing systems of the South.

  The problem that we face today, in respect of efficient representation of text in our languages, is precisely one of not being able to do either effective linguistic processing or meet the real requirements of the writing systems.

  The Multilingual Systems Development Project at IIT Madras had taken the view that efficient text processing is absolutely essential and is perhaps more important than precise rendering of text so long as ambiguities are avoided. The consequence of this decision was that the coding structure should preserve linguistic content as well as provide complete rendering information within the flexibilities offered by the writing system. Such a coding scheme would require syllables to be coded since the linguistic content is expressed through syllables and the writing system displays syllables. The multilingual software applications developed at IIT Madras have successfully demonstrated that linguistic text processing at the syllable level is not only possible but can also be accomplished by using conventional algorithms which work with fixed size codes. In contrast with this, application development  with Unicode support has raised a number of issues which must be thoroughly discussed and understood before one accepts Unicode as a viable standard for computing with Indian languages.

  In the light of the above, the Systems Development Laboratory, IIT Madras is pleased to share with the viewers, the Lab's experiences in dealing with linguistic and rendering issues of text in all the important scripts of India.


 
Multilingual Computing- A view from SDL

Introduction
Viewpoint
Writing systems
Linguistic requirements
Dealing with Text
Computing requirements (for India)


Unicode for Indian Languages

The conceptual basis for Unicode

Unicode for Indian scripts
Data entry
Issues in rendering Unicode
Using a shaping engine
Discussion on sorting
Open type fonts


Unicode support in Microsoft applications

Uniscribe
Limitations of Uniscribe

A review of some MS applications supporting Unicode



Recommendations for Developers of Indian language Applications

Using True type fonts to render Unicode Text

Can we simplify handling Unicode text?

Guidelines for development under Linux


Summary of SDL's observations

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Apr. 23, 2018
Local Time: 11 36 02

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on     Best viewed at 800x600 or better