We have already
seen the conceptual basis for Uniscribe. An application will examine the
Unicode string to be processed and perform whatever linguistic processing
is required. The result to be displayed will then be given to Uniscribe.
Uniscribe will apply the rules of the writing system consistent with the
language and return to the application the associated glyph string to be
used with the font specified for the script.
the rules of the writing system for a language (associated with the script)
and decides if the display will be consistent with the rules by querying
the Open Type Library Services (OTLS). This should return information to
the querying program about the features supported in the font. Uniscribe
will see if the feature supported will satisfy the writing system rule
to be implemented and will select the glyphs to be shown if the rule is
satisfied. Otherwise, Uniscribe may default to a form of display for the
Uniscribe cannot work
by itself and render text since it must know whether the specific rule
can be implemented with the glyphs provided in the Open type font. Clearly
some default behaviour is expected from Uniscribe when a rule cannot be
implemented in the required manner. There can be a choice of displays even
in the default behaviour since alternate forms for a syllable are always
permitted. The real issue is one of deciding which form is better suited
for the application.
The limitations of
Uniscribe may be examined from three different perspectives.
Conceptual problems with
Problems specific to Unicode
The extent to which the rules
of the writing system have been correctly implemented.
The default behaviour of Uniscribe,
which is perhaps influenced by the features supported in an Open type font.
Unicode puts the script
ahead of the language and assumes that the writing system is influenced
by the language. This means that we cannot associate a new language with
the script without modifying Uniscribe.
Perhaps there will
be no need for this in practice, for it may be argued that a script is
a means to giving a visual representation for a sound and a language is
specified by the sounds the speaker utters. This is a wrong view to take
because what is important is that a person should be in a position to identify
the sound associated with a given shape to get the linguistic content.
So long as the person knows that the same sound can be represented in different
forms in different scripts, he/she can comfortably read the text.
So text in a given language could be written in any script that correctly
relates shapes to the sound. It is common practice in India to write text
in a language in many different scripts.
Devanagari, Brahmi, Grantha, Sharada, Phonetic alphabet, Telugu Tamil -
Modern Tamil script, Vattezhuthu (700AD-1300AD), Tamil Brahmi
- Devanagari, Modi
- Arabic script, Devanagari
It will be almost
impossible for us to use any of the above scripts with Uniscribe except
Devanagari and Tamil, for the writing system system rules are very different
for each script and the requirement cannot be simply handled by creating
a suitable Open type font.
Extent to which the rules
of the writing system can be implemented
This is largely a
matter of exhaustively listing the writing conventions including all the
alternate forms for all the syllables seen in common use. The person who
does this must have both linguistic knowledge as well as knowledge of the
script to vouch for the correctness of the rule. Such persons are rare
and some of them are known to have an aversion for computers! You require
experts who have learnt about the development of Typography over nearly
a hundred and fifty years just to identify how manuscripts were typeset
earlier consistent with the writing seen on palm leaf manuscripts or other
writing media common in the country. Today, many states in the country
have greatly simplified the rules by standardizing on a small set of shapes
which can fit into a manual typewriter. So it may be virtually impossible
for a person to present text consistent with some older manuscript, if
Uniscribe implements only the modern rules.
As of now, a rule
can be implemented only if the associate Open type font provides for Glyphs
consistent with the rule. An Open type font for Devanagari will become
truly unwieldy should it become necessary to support glyphs conforming
to the conventions which have been followed for years. Thus Uniscribe will
also be limited by the capabilities of the font.
Default behaviour of Uniscribe
The default behaviour
of Uniscribe is dictated by the extent to which the required features are
supported in the Open type font. Also, the rules for syllable formation
cannot be ignored. Given a Unicode string, identifying the syllables which
have to be rendered is not an easy task if there are Unicode characters
such as the zero width joiner and non joiner. The state machine which examines
the string can indeed get confused if such characters are present in the
input. In fact this happens with Microsoft Word.
that arbitrarily long syllables may also be input and defaults to amusing
shapes for certain syllables. Try a syllable with four "ra"s. It will be
quite difficult for any application to decide on an appropriate form for
default rendering of the syllable, unless it knows what alternatives are
available. This requires exhaustive querying of the Open Type Library Services
and can make the application unnecessarily complex.
is best attempted when the quantum of information that is handled at a
time is a data item of a known fixed size such as byte, two bytes or even
four bytes. Regular expression matching will work best only when this is
satisfied. In the absence of a fixed size quantum, any string processing
become complex and unwieldy.