The llf to unicode
utility provides for conversion of multilingual text into Unicode format
(utf-8 as well as uch-2). The conversion is based on the principle that
syllables are to be represented. Fixed tables are used for converting llf
codes to Unicode while a lexical analyzer is used for converting Unicode
to the llf format.
Unicode provides for special
codes such as the zero width modifiers to force the rendering to specific
forms. These codes, if present in the Unicode text, may not get converted
properly as there is no concept of forced rendering in the IITM scheme.
Current assignment of Unicode
for Indian scripts is restricted to the basic vowels, consonants and medial
vowel forms and this allows for arbitrarily long samyuktakshars to be represented.
The IITM scheme permits only a limited set of Samyuktakshars to be represented
but this is a substantial number by itself. Thus there may be differences
in rendering long Samyuktakshars.
The utility is a simple one
written in ANSI C and will compile properly on most systems including MSwindows.
Separate utilities are to be used for conversion from llf or to llf.
There is also a provision for converting between uch-2 and utf-8.
SDL, IIT Madras does not
endorse the use of Unicode for Linguistic processing of text in Indian
languages. The experiences
of the lab in dealing with Unicode text are fully described in an independent