Home --> Software Design Issues --> Unicode --> uni_limit
Search  
 
Limitations of Uniscribe
  We have already seen the conceptual basis for Uniscribe. An application will examine the Unicode string to be processed  and perform whatever linguistic processing is required. The result to be displayed will then be given to Uniscribe. Uniscribe will apply the rules of the writing system consistent with the language and return to the application the associated glyph string to be used with the font specified for the script.

  Uniscribe implements the rules of the writing system for a language (associated with the script) and decides if the display will be consistent with the rules by querying the Open Type Library Services (OTLS). This should return information to the querying program about the features supported in the font. Uniscribe will see if the feature supported will satisfy the writing system rule to be implemented and will select the glyphs to be shown if the rule is satisfied. Otherwise, Uniscribe may default to a form of display for the syllable.

  Uniscribe cannot work by itself and render text since it must know whether the specific rule can be implemented with the glyphs provided in the Open type font. Clearly some default behaviour is expected from Uniscribe when a rule cannot be implemented in the required manner. There can be a choice of displays even in the default behaviour since alternate forms for a syllable are always permitted. The real issue is one of deciding which form is better suited for the application.

  The limitations of Uniscribe may be examined from three different perspectives.

  • Problems specific to Unicode assignments themselves.
  • The extent to which the rules of the writing system have been correctly implemented.
  • The default behaviour of Uniscribe, which is perhaps influenced by the features supported in an Open type font.
Conceptual problems with Unicode assignments

  Unicode puts the script ahead of the language and assumes that the writing system is influenced by the language. This means that we cannot associate a new language with the script without modifying Uniscribe.

  Perhaps there will be no need for this in practice, for it may be argued that a script is a means to giving a visual representation for a sound and a language is specified by the sounds the speaker utters. This is a wrong view to take because what is important is that a person should be in a position to identify the sound associated with a given shape to get the linguistic content. So long as the person knows that the same sound can be represented in different forms in different scripts, he/she can comfortably read the text.  So text in a given language could be written in any script that correctly relates shapes to the sound. It is common practice in India to write text in a language in many different scripts.

Sanskrit-  Devanagari, Brahmi, Grantha, Sharada, Phonetic alphabet, Telugu Tamil -  Modern Tamil script,  Vattezhuthu (700AD-1300AD), Tamil Brahmi
Marathi - Devanagari, Modi
Sindhi - Arabic script, Devanagari

  It will be almost impossible for us to use any of the above scripts with Uniscribe except Devanagari and Tamil, for the writing system system rules are very different for each script and the requirement cannot be simply handled by creating a suitable Open type font.

Extent to which the rules of the writing system can be implemented

  This is largely a matter of exhaustively listing the writing conventions including all the alternate forms for all the syllables seen in common use. The person who does this must have both linguistic knowledge as well as knowledge of the script to vouch for the correctness of the rule. Such persons are rare and some of them are known to have an aversion for computers! You require experts who have learnt about the development of Typography over nearly a hundred and fifty years just to identify how manuscripts were typeset earlier consistent with the writing seen on palm leaf manuscripts or other writing media common in the country. Today, many states in the country have greatly simplified the rules by standardizing on a small set of shapes which can fit into a manual typewriter. So it may be virtually impossible for a person to present text consistent with some older manuscript, if Uniscribe implements only the modern rules.

  As of now, a rule can be implemented only if the associate Open type font provides for Glyphs consistent with the rule. An Open type font for Devanagari will become truly unwieldy should it become necessary to support glyphs conforming to the conventions which have been followed for years. Thus Uniscribe will also be limited by the capabilities of the font.

Default behaviour of Uniscribe

  The default behaviour of Uniscribe is dictated by the extent to which the required features are supported in the Open type font. Also, the rules for syllable formation cannot be ignored. Given a Unicode string, identifying the syllables which have to be rendered is not an easy task if there are Unicode characters such as the zero width joiner and non joiner. The state machine which examines the string can indeed get confused if such characters are present in the input. In fact this happens with Microsoft Word.

  Uniscribe assumes that arbitrarily long syllables may also be input and defaults to amusing shapes for certain syllables. Try a syllable with four "ra"s. It will be quite difficult for any application to decide on an appropriate form for default rendering of the syllable, unless it knows what alternatives are available. This requires exhaustive querying of the Open Type Library Services and can make the application unnecessarily complex.

  String processing is best attempted when the quantum of information that is handled at a time is a data item of a known fixed size such as byte, two bytes or even four bytes. Regular expression matching will work best only when this is satisfied. In the absence of a fixed size quantum, any string processing will become complex and unwieldy.

Multilingual Computing- A view from SDL

Introduction
Viewpoint
Writing systems
Linguistic requirements
Dealing with Text
Computing requirements (for India)


Unicode for Indian Languages

The conceptual basis for Unicode

Unicode for Indian scripts
Data entry
Issues in rendering Unicode
Using a shaping engine
Discussion on sorting
Open type fonts


Unicode support in Microsoft applications

Uniscribe
Limitations of Uniscribe

A review of some MS applications supporting Unicode



Recommendations for Developers of Indian language Applications

Using True type fonts to render Unicode Text

Can we simplify handling Unicode text?

Guidelines for development under Linux


Summary of SDL's observations

Acharya Logo
Distant views of the Himalayan Peaks are unforgettable and awe inspiring!

Today is Oct. 18, 2018
Local Time: 19 59 00

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on     Best viewed at 800x600 or better