image
image
image
image
image
image
 
Home -->  Multilingual Systems -->  Rendering Text in Indian languages
Search  
 Rendering text in Indian languages
A computer system uses graphical means to display text where the letters and their symbols are displayed using appropriate fonts. In respect of English and most European languages, standards which relate character codes to specific displayed shapes have been in vogue for several years. The ASCII code is standard for the Roman alphabet but it is also used for other languages where the letters carry diacritical marks. The ASCII code is an eight bit code that permits 256 different characters to be represented.

In multilingual documents, one will find more than 256 symbols and hence the concept of ASCII cannot be applied directly. Often one uses the principle of Code Pages where the interpretation of the ASCII code changes to reflect the display of text in a different language. This way one selects a specific code page at a time and this is used to map the code to the displayed shapes. Code page switching is meaningful for languages/scripts which employ a limited set of characters in displayed text. Thus one associates a character set with the text and the display is effected through the use of fonts which conform to the character set.

In respect of Indian languages, it has not been possible to define any specific character sets since the writing systems employ literally thousands of different shapes. What has been generally accomplished in the past is that a minimal set of shapes (often restricted to about 180-200) is defined and the syllables to be displayed composed from these shapes. This way, eight bit fonts have been used to display text in most Indian languages/scripts. Unfortunately, the problem of dealing with text on a computer relates more to identifying the linguistic content from the text displayed, for text processing in Indian languages has to be attempted at the level of an akshara (Samyuktakshara) which is essentially a syllable.

Codes for the character sets from different languages may be pooled together to form a much larger set than 256, requiring the use of 16 bit codes and corresponding fonts. This is the idea behind Unicode which has become a new standard for text representation. Yet, the assignment of codes for languages/scripts which are employ a syllabic writing system continues to pose problems for unicode since the code space given to such languages is restricted only to the basic consonants, vowels and medial vowel forms. One is forced to code syllables in terms of variable length representations. This poses fairly serious problems for rendering text as well as linguistic processing. It is very difficult to map a variable length code to a displayed shape.

The Indian language computing scene continues to pose challenges for Software Developers since no standards exist for text rendering. The problem of text rendering is discussed in detail elsewhere in these pages.

As of this writing (June 2006), Computing with Indian languages continues to remain complex in the absence of any agreed standard that works well for all the Indian languages. As a result it is difficult to disseminate information in Indian lanaguges through the web where multitudes of systems and software have to agree to deal with syallbles rather than letters.

Computing with Indian languages will remain complex unless one seriously considers text processing at the level of a syllable with fixed length codes.
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

Acharya Logo
Himalayan peaks as they reflect the golden rays of the early morning sun.

Today is Sep. 27, 2017
Local Time: 01 55 19


| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/26/12    Best viewed at 800x600 or better