Acharya Web Site
Home -->  Software Design issues -->  Encoding Standards
Unicode rendering examples.

The following examples illustrate the variations seen in different applications while rendering  Unicode strings in Indian languages. This page was setup in August 2007 (after refinements to Unicode rendering were put into effect in recent versions of operating systems and applications). This site also includes pages written earlier highlighting the differences encountered in different applications. 

A short explanation of the rendering will be seen at the right of each screen shot. This explanation is provided to identify the nature of the problem in rendering, be it a problem with basic encoding itself or the idiosyncrasies of the application.


Devanagari rendering under Ubuntu and Windows Vista

Tamil rendering under Ubuntu and Windows Vista

Data entry problems

Application dependent rendering

Problems with arbitrarily long syllables

Google on Ubuntu Sanskrit

Google Win Sanskrit

Devanagari (Ubuntu)

Shown at the left is the page from Google returning results for a search of pages which refer to the Acharya site. The page displayed by Firefox under Ubuntu, shows displaced matras s well as linearized rendering of syllables.

The display under Firefox on a Vista system is proper. 



Google ubuntu Tam

Google win Tam

Tamil (Ubuntu)

Shown at the left is a page from Google which includes Unicode Text in Tamil. The rendering under Ubuntu is totally inappropriate. The medial vowel shapes are in the wrong place. Worse still, the rendering of syllables with the vowel "u" are completely wrong.

Tamil has the advantage of a simpler script where syllable formation is relatively easy. Unfortunately, the application uses inappropriate algorithms to render syllables.

The rendering under Windows is correct, though one must keep in mind the fact that Unicode for Tamil does not address all the requirements.




Data Entry related issues

The display at the left is the Wordpad screen under Windows Vista. The problems of Unicode data entry are highlighted in the display. It has been possible to create identical displays for two different strings. This example shows that preparing a text string for a query may be an extremely difficult task. 

Zero width non joiners can bring in confusion when a syllable is linearized by the user. It turns out that Wordpad allows the entry of a zero width non joiner but Notepad does not permit the same.

The problem here is that one is trying to create a syllable in two different ways, one with a single code and the other with two codes, resulting in ambiguity.

The nukta character is not a linguistic entity and  Unicode assignment for it is as inappropriate as the assignment of medial vowel forms. The linguistic structure demands that we assign codes so as to clearly write syllables which can be identified without confusion.

Application dependent rendering

The fact that Unicode rendering is necessarily application dependent is illustrated in the two screen shots at the left. Wordpad and the Word processor under Microsoft Works are taken as applications. The rendering under wordpad is correct while the one with Word shows totally incorrect display. 

The applications run under Windows Vista. 

The assignment of Unicode values to Indian language letters is such that syllables have multibyte representations. To accommodate different renderings of the same syllable, Unicode allows the use of special characters which are known as Modifiers. The modifiers are not handled properly by different applications. Also the algorithm for rendering cannot ever be standardized due to the variations permitted in syllable representation. This is the reason why rendering is basically a responsibility of the application.

Note how the medial vowel shapes are incorrectly placed s well as incorrectly rendered.



Arbitrarily long syllables

Unicode rendering often goes by the assumption that it should be possible to handle arbitrarily long syllables. It is easy to force on the application, syllable formation with valid Unicode values which have only symbolic value and not strictly a linguistic value for the code. The codes for the medial vowels, the nukta are  few examples. 

These codes can confuse the application while arbitrarily long syllables are attempted.

The display at the left is a consequence of entering a series of halanth codes (almost 400) in this case when the data entry state machine starts to misbehave.

A copy of the file is available for you to verify this. Please note the differences in rendering between Wordpad and Notepad. One has to accept the fact that the rendering issue cannot be totally divorced from the application. When the application decides on how a specific case will be handled, uniformity across applications is lost. 

While one can dismiss this as a pathological example, one should remember that, Unicode allows a user to compose a syllable with special modifier codes. Hence it may be virtually impossible to discern the internal representation of a displayed string, which information is essential while typing in query strings for searches.

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Apr. 05, 2020
Local Time: 00 50 13

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/26/12    Best viewed at 800x600 or better