image
image
image
image
image
image
 
Home -->  Multilingual Systems --> Limitations seen in Word Processing applications
Search  
 
Text Processing Limitations imposed by Word Processing and DTP Software
   In spite of the complex nature of Indian scripts, typesetting and printing documents using word processors or special typesetting programs have been created and have remained popular for several years. As a matter of fact, from the very beginning, DTP services in Indian languages have been popular through packages such as PageMaker or Ventura and more recently with Microsoft Word. 
    The most important point to observe here is that, the display of all the scripts of Indian languages can be comfortably handled with just about 150 to 230 glyphs in the font for each script and the data entry process consists of specifying the Glyph codes for each akshara that should appear on a printed page. 230 glyphs are feasible only on some platforms which correctly understand the Font Encoding. The MacIntosh, for instance has always featured Fonts with close to 250 glyphs and has been a favourite for DTP work. Microsoft Windows fonts allow about 210 glyphs, still considered a rich set when compared to ISO8859-1 fonts which support only about 190.
Applications which rely on the availability of specific (and well designed) eight bit fonts for printing Indian language documents use a text representation that relates to the Glyphs in the font and not the vowels and consonants in the language.

    Typesetting packages based on TEX use a simple ASCII based input document consisting of macro definitions for each akshara or in some cases, just plain glyph codes. Data entry is often effected using a transliteration scheme which allows the user to enter the text using ASCII equivalents (names) for the aksharas and a special preprocessing program will convert the same into a proper TeX document. TeX fonts have the advantage that almost all the 256 glyphs in the font may be output which is something not realizable with fonts used in word processors. Typesetting with TeX can result in superior quality of the documents with carefully controlled spacing of lines but the process is very time consuming and a uniform approach across all Indian languages cannot be taken. Besides, typesetting using TeX will not permit a "WYSIWYG" user interface. TeX documents are usually seen or printed using PostScript.

    Typesetting using DTP packages which make use of standard fonts (typically Truetype fonts under Windows) relies on data entry  based on glyph codes. In some cases, the software permits data entry through macros where one need not worry about glyph codes but may concentrate on the aksharas to be input.  Most DTP packages will require that specific fonts be used and one often runs into the problem of not being able to produce certain conjunct representations either due to the absence of the corresponding glyphs in the font or due to the inherent limitations in the macro. 

    In either case, whether a typesetting program is used or a word processor, the internal representation of  the text is in a form specific to the word processor or program and varies widely. Worse still, the internal representations will include extensive formatting information, rendering it quite difficult, if not impossible, for us to discern the Indian language text in the document. 

    Thus the most important limitation imposed by Typesetting or word processing software used in the preparation of Indian language documents is their inability to provide an internal representation suited to effective electronic processing of the text consistent with the syllabic writing systems followed for the languages. 

    Secondly, since these packages are designed to be run on specific platforms, moving  documents across different platforms becomes quite difficult. Even today, the full power of TeX is realized only on Unix Systems and the useful features of Word or WordPerfect may be utilized only under Microsoft Windows and to a limited extent on the Mac. 

    It must however  be admitted that more than 90% of the requirements for Indian language applications relate to plain publishing and the available packages seem to adequately meet the requirements. 

    We might mention here that almost all these packages which permit multilingual printing, do not permit text entered in one script to be automatically printed in other scripts, as they do not support correct transliteration. Invariably there is a need to enter the same text a second time (in a different script). This multilingual presentation of the same text through transliteration is very useful when common information is sent across different states of the country.

Problems faced with Indian language fonts

    DTP and Typesetting packages rely on a specific set of fonts to produce a printout. Often it turns out that the glyphs in the font do not accommodate the representations for some conjuncts, specifically some which have four consonants in them. The non availability of glyphs for specific conjuncts poses frequent limitations in getting printouts. Further, even standard fonts used under Windows, may not be rendered properly under Unix, due to the variations in font encodings supported on different systems. In most fonts, the first thirty two glyphs are not rendered and often the 32 locations from 128 through 159 are not likely to be rendered. 

    The variations in the minimum number of glyphs required to form an akshara in a specific  script create a basic problem in respect of Indian language text. Any internal representation, which relies on glyph codes, will not work across languages and unless one standardizes font glyph locations for a script, software to render the text will  become font dependent. The aksharas of Indian scripts vary significantly in width from very narrow aksharas to very wide glyphs. Designing Glyphs to accommodate widely varying character widths, that too to be formed from multiple glyphs is a tricky job. 

    Among the font families, Metafont offers the maximum flexibility to accommodate the requirements for Indian scripts. The Metafont given by Charles Wikner represents a design that is thorough in respect of rendering more than a thousand conjuncts of Sanskrit. Unfortunately, the fonts used for Windows and Unix, restrict the number of glyphs and hence the rich set of conjuncts supported by Wikner's Metafont, cannot be realized in practice except through TeX, which does not admit of WYSIWYG features.

Unicode Fonts for Indian Languages

  In recent years,  Unicode has gained significance for multilingual work and applications supporting Unicode for Indian languages have been created (Mostly by Microsoft). Yet, on account of the variable length nature of Unicode strings for a syllable, conventional fonts cannot be used. Special Open Type fonts are required and these are rather difficult to design. It also happens that an application supporting Unicode should build into itself the rules for generating the shape for the syllable. Since these shapes are a choice of the font designer, consistent displays are difficult to get. It is clearly known that applications render the same Unicode text in different ways. This goes against the principle that text representation should not be tied to display requirements, if uniform handling of text is to be accomplished across applications.



 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

The logo on this page shows the Himalayas in a panoramic view.

Today is Jul. 27, 2017
Local Time: 20 32 26

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/26/12     Best viewed at 800x600 or better