image
image
image
image
image
image
 
Home -->  IITM Software -->  Rendering Syllables
Search  
Rendering a text string
( Approach to displaying the Aksharas from syllable level codes)

  The structure of the syllable level code ( in the form of a triplet) allows easy rendering of the syllable by specifying the sequence of glyphs to be displayed. The sequence of glyphs depends on the Font used. Since the code for the syllable is fixed at two bytes, it is possible to employ a simple lookup for getting the glyphs associated with any syllable supported by the coding scheme. Almost any eight bit font could be used here if the glyphs are considered adequate for rendering the text. This method will work properly across platforms if the support required for rendering glyphs is available, given the glyph string and the underlying encoding. Typically, a font conforming to ISO-8859-1 will render properly on most systems. Hence the IITM scheme recommends the use of such fonts.

   The IITM scheme allows correct rendering of text using the IITM fonts on all the important platforms- MSWindows, Unix, Macintosh and PostScript Devices. It is possible to correctly view and print the contents of a  web page prepared using the IITM fonts, for in each case, the glyphs are named conforming to the native encoding of the corresponding l the platforms. This is essentially the ISO-8859 scheme with glyph names mapped properly in each platform. 

   The lookup is effected through a three dimensional array corresponding to the triplet representation. This array is specified through a file which is read in by the application when it is invoked. This approach also permits dynamic selection of the lookup since the application can reinitialize the array as and when required.  Thus, the same local language  text may be shown in different fonts, irrespective of the glyph locations so long as the required set of glyphs is supported. Also, the fact that only eight bit fonts are needed gives us the freedom to use freely distributed fonts.

   The 15 bit syllable code allows us to represent as many as 32 thousand aksharas. That many are not required in practice. The overall display process is illustrated below. Please remember that the actual code is 16 bits wide. The most significant bit is not used as part of the code but serves as an indicator when the 16 bits contain information that should be interpreted rather than rendered. Such a situation arises when a change of script is specified.

   The application may dynamically select a font for displaying the text and generate the glyph string by looking up the related table. Each entry in the table corresponds to to an element of the three dimensional array into which the triplet is mapped. A typical entry in the file which provides the table (the file ia given a .tab extension to identify it as a table) has to conform to this simple form. (Please refer to the discussion of the IITM syllable level coding scheme for additional information)

[c(i),cnj(j),v(k)]  = g1g2g3..gn

c(i) - is the code assigned for the base consonant in the syllable.

cnj(j) - is the value of the conjunct part of the syllable in the range 1-31

v(k) - is the code assigned to the vowel.

g1g2g3 ..gn are the eight bit indexes of the glyphs which must be rendered to display the akshara. In practice, these glyphs will have the required shapes for the consonants, matras as well as other ligatures to correctly build the shape for the akshara to be displayed. It should be observed here that the set g1g2..gn will have to be individually identified for each entry in the table. With as many as 12000 or more entries possible in the table, this may appear to be a formidable task. Fortunately, it is possible to generate this table with a clever program and fine tune the entries manually. Also, since glyphs for a script cannot really vary a lot across fonts, generating the table for a second font will be easy if the table for the first font is already available.

   It will also be seen that the set of glyphs g1g2..gn for any akshara will vary from font to font. The manner in which one builds up an akshara by choosing appropriate glyphs may also vary depending on the specific form required to be shown since an akshara may be shown in more than one form.

  Another important point to note here is that the syllable is not associated with a script but a language. Hence it will be easy to show the same syllable in multiple scripts merely by switching the fonts and the associated tables. It is therefore possible to achieve perfect transliteration across scripts.

  The glyph string is the basis for generating the display. One need not stop with just displaying the glyphs using the API provided by the Operating System to display text strings. One can indeed pass the glyph string to other applications that can provide different visual representations such as an image of the text, an XML description of the display, HTML output or just a PostScript representation. Thus the fixed length syllable level code can be used effectively to produce a variety of display formats and one can dynamically choose the format by merely passing the glyph string to an appropriate module. This will have significant benefits when generating displays for web browsers which are know to support one format well, say the PDF. The local language text may then be sent as PDF file to the browser on demand and hence guarantee viewing of the text.

The on-line demo at the acharya web site demonstrates this well.

  The essence of the IITM scheme of representing text lies in the power of the fixed length code for a well chosen set of syllables. This approach truly separates the rendering process from the encoding process and so applications can very easily render text through common APIs that are proven and known to work across systems. While one may argue that the scheme does not cater to all possible syllables, the approach is indeed scalable if the code space available can be extended to 32 bits. Variable length representations such as Unicode, always require applications to know how an akshara should be rendered and check if this rendering is possible with the font provided (an Open Type font in this case). Thus an application supporting Unicode for Indian languages has to be necessarily language or script dependent. There is no simple method which allows the same Unicode text (same set of syllables) to be displayed in multiple scripts.

Working with Unicode Fonts

   The mapping function can also specify sixteen bit glyph index values on the right hand side of the entries in the table. This will allow sixteen bit fonts to be used. Since arbitrary Unicode values cannot be assigned to the glyphs, the range E000-E7FF, reserved for user defined codes may be utilized for this purpose and glyphs located in this region. In fact Open Type Fonts for Indian languages use this range to define the composite glyphs used in forming the aksharas.

   It turns out that there is no need to use Open Type Fonts when 16 bit character codes are used. The Open Type Font is specifically contrived to cater to mapping variable length codes to specific sequence of glyphs. One has seen that it is really not possible with the existing Open Type Fonts (Mangal under Windows or the BBC fonts etc.) to cater to the full compliment of ligatures used in traditional print. Yet, one could do quite well with 8 bit fonts supporting 240 or so glyphs. The only difficulty with these eight bit fonts is that they are not guaranteed to render properly on any platform and an eight bit font that provides compatibility cannot really have more than about 188 glyphs.  With 16 bit Fonts, we can use many more than 256 glyphs and achieve high quality typesetting, so long as the underlying system has the support to render each glyph properly. It is observed that rendering of the glyphs in the user defined area is done properly only under MS windows. The Mac and Linux systems seem to have problems (as of this writing) in handling glyphs in the range E000-E7FF.

  There are many advantages to using traditional True Type fonts if they can support enough number of glyphs to fulfill the requirements of rendering text. The IITM Unicode Font has the required number of Glyphs to handle all the scripts. Each script has a 256 glyph locations, many more than the 188 possible with eight bit fonts.

   The development team at SDL has come up with a utility to convert .llf files to HTML using the IITM Multilingual Unicode font. The HTML file is produced in UTF-8 format.  You can compare the display of the same text using standard Unicode rendering against the display produced with IITM Unicode fonts. Here is the link to the concerned page which provides the details.

Converting Glyph strings to llf

   Applications supporting cut/copy and paste operations may also be able to map the displayed glyphs back to their corresponding llf characters. Basically, the glyphs of the font are to be interpreted as symbols of text which may be parsed and the text converted through Lex and Yacc based tools.

   In syllabic writing systems, it is not always required that there be a one to one mapping between the internally stored text string and the corresponding glyph string used to display the same. However, a one to one mapping can simplify the reverse process of identifying a syllable from a glyph. In this case, the number of glyphs in the font can well be of the order of 10000 or more if alternate rendering of text should also be accommodated. In practice however, one takes advantage of the fact that a syllable can be shaped according to rules using a much smaller set, perhaps of the order of hundreds. With this approach, getting back a syllable code from the glyphs will be a challenge. 

  Encoding standards such a Unicode suffer from this problem where a multibyte string (variable length) is used to represent a syllable. It is common practice in these cases to employ two internal buffers while dealing with displayed text, one for the text consistent with the encoding method and the other for the glyph string. Cut/Copy and paste operations always use these two buffers to maintain the correct relationship between the text string and the glyph string.

  In the IITM approach, the application is rquired to maintain two buffers, one for the syllable codes and the other for any representation of the displayed text that can directly relate to the font, such as  a glyph string, an XML based description of the string etc.

  In the MSWindows version of the IITM multilingual editor, the internal buffer for the glyph string is managed through Rich Text controls and hence copy/paste operations are straightforward when text is pasted onto other applications. However, once pasted, the text will not conform to the llf standard and so editing will not be possible at the syllable level but only at the glyph level.
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

Acharya Logo
Swan and her cygnets. A happy scene.

Image source
www.genre.fsnet.co.uk/gallery/birds/cygnets.jpg
Reproduced with permission from the author John Robinson

Today is Oct. 21, 2019
Local Time: 22 50 06


| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 11/07/12    Best viewed at 800x600 or better