image
image
image
image
image
image
 
Home -->  Multilingual Systems --> Technical issues: Language Enabling vs Localization
Search  
 

Language Enabling versus Localization

Technical issues in providing multilingual support within operating systems.
Basic concepts
Language enabling
Localization
Basic concepts.

  During the past few years, interest in working with multilingual documents has gained significance, especially when it comes to information display on a web page. Besides, software developers are always looking for ways by which the same package could be made to work with different languages. An example would be a text editor that may be used to prepare text in multiple languages all in the same document. As of now (2005), the approach to generating multilingual documents relies on the use of Unicode.

  During the early years of development of computers, user interfaces relied on text representation using the simple ASCII approach. This way, the text displayed in a document would consist of the letters from just one language.  Applications such as Web Browsers would examine the Content:type information specified at the beginning of an HTML page to figure out the way the information in the page should be interpreted. This concept is known as the Character Set, where the numeric codes in the text corresponded to specific letters of the character set of a language. 

  The character encoding specifies how a given code should be interpreted. It must be remembered that only the codes from 32 to 127 would have a standard interpretation across different character sets. In particular, the codes corresponding to the upper ASCII (128 to 255) are not ever guaranteed to be interpreted the same way, even for a given character set, because  applications (typically web browsers) should know the intricacies of each character set to display the letters. For example, an html document encoded in the standard ISO-8859-1 character set will not render properly under Netscape on a Macintosh system!

  A test page to check the rendering capability of a web browser is available at

http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/entities.html

Top

Language Enabling

  For a computer application to work with multiple languages at the same time, there must be a mechanism to work with different character sets at the same time. Only recently, after Unicode has gained good acceptance, applications have started dealing with character codes which also include information about the character set in use. Unicode had made it possible for the characters of most world languages to be uniquely represented by assigning character locations for each language. Text based on Unicode will thus be easy to interpret and when dealing with a particular language, appropriate fonts could be selected to display the letters. The computer application is also required to assign keys for data entry so that text in different languages may be entered without difficulty.

  Conventional applications which work with eight bit character codes will not be able to handle the double byte coded text in a Unicode document. Hence it is necessary to rewrite text editors or develop new versions which can handle Unicode. Microsoft word is one of the applications capable of dealing with Unicode.

  Language enabling is a concept where the application will be able to allow data entry and display in the required language by allowing dynamic selection of the language during data entry. In an application that is enabled for a particular language, what is seen on the screen or printed, will have text in that language. With Unicode, an application would select the appropriate 'Locale' for data entry in a particular language. The 'Locale' corresponds to a selection within the Operating System to accept and display the text in the preferred language. Different locales have to be installed in an Operating System to cater to different languages. 

  Data entry may not always be straightforward if the letters of the language bear no resemblance to the Roman alphabet. Yet many applications may project a keyboard on the screen and allow data entry through mouse clicks. In all these cases, the current practice is largely, one-keystroke-one-glyph, where each glyph shown on the screen corresponds to an individual letter of the alphabet. This approach does pose difficulties for languages where the representation of the characters involves combinations of two or more glyphs to display a single conjunct character.

  In case of languages whose character sets do not figure in Unicode, there is a real problem. However, one may always resort to a trick of telling the system that a standard character set is being used but the system should use a specified font for the display. This works well in practice so long as one can prepare the document to be consistent with some character set. By and large this has been the approach taken to displaying Indian language characters in most computer applications. 

  Though Unicode does allocate space for Indian languages, the limited 128 positions is practically useless when it comes to dealing with literally thousands of aksharas. So even displays on web pages resort to specifying the ISO-8859-1 encoding and using almost 186 font glyphs to form the aksharas. This works well under Windows and Unix where the ISO-8859-1 encoding is well honoured but not on a Mac where many of the ISO-8859-1 characters do not get recognized properly.

  Language enabling thus allows mere preparation of quality documents using the application. Microsoft Word is known to excel in this for most world languages and some Indian languages as well (with a difficult, non intuitive data entry method). The data entry method is different for different Indian languages however, being a consequence of differences seen in the Unicode assignments.

Top

Localization

  Localization is a totally different concept in which the entire interaction with the application, including all the commands, is done in the specific language. This calls for major enhancements to the system software to allow interpretation of text strings in different languages. In an application that is localized for a particular language, one may never see Roman text on the screen and all computing, including naming of files, may be done in the specified language. In other words, an application supporting localization for a language can provide an effective user interface in that language. Thus a person need not know English to run the application.

  Localization is difficult to achieve for languages which have large number of displayed shapes (Samyuktakshars) such as Indian languages. This is a consequence of the fact that localizations still rely the assumption that a small set of letters (128) is all that will be encountered for text processing!

  Localization has been reasonably successful in respect of many world languages with small character sets. The reason may be attributed to the short fixed length code (just one byte) required to represent the letters of the alphabet. Even Japanese and Chinese based applications, which have to deal with over twenty thousand characters, have been successful only because people found ways of assigning unique fixed length codes for the characters seen in these languages. 

  It is in respect of Indian languages that localization has failed because no standards exist for uniquely identifying the different aksharas using a fixed length code. The problem is unique to the languages of the world in which the writing system corresponds to syllables rather than individual letters. In a situation where the writing system is not based on syllabic representation, the eight bit (or two byte) fixed length code is more than satisfactory.

  World over few developers have succeeded in Localizing any of the applications to Indian languages. This is a difficult problem to handle, since one has to process thousands of Aksharas when interpreting a text string. 
As on July 2004, Only Microsoft Word and some Office 2000/XP products allow language enabling and localization for some of the Indian languages. The situation may change however! Localization using Unicode has had its share of problems for Indian scripts. It turns out that the responsibility of rendering text also rests with the application when variable length syllables have to be processed. We have a page devoted to this problem where a given document in Unicode is rendered in totally different ways by different web browsers.

Top


Language Enabling has been accomplished through the use of Unicode and locales which permit data entry in different languages.

Support for locales is built into the Operating Systems but users have to install specific  locales to cater to specific languages.

When locales are switched, as is required for data entry for a new language, the keyboard mapping changes automatically. On-screen keyboards are also popular.

Typing in punctuation marks is usually a problem with Indian languages/scripts due to the limitations on the number of keys available for data entry.
 



In respect of Indian languages, very few instances of localization are seen, though a few specific applications have shown that localization is possible but not without problems.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Nov. 20, 2017
Local Time: 19 38 46

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/26/12     Best viewed at 800x600 or better