Conversion Utilities: LLF to Unicode

The llf to unicode utility provides for conversion of multilingual text into Unicode format (utf-8 as well as uch-2). The conversion is based on the principle that syllables are to be represented. Fixed tables are used for converting llf codes to Unicode while a lexical analyzer is used for converting Unicode to the llf format.

Unicode provides for special codes such as the zero width modifiers to force the rendering to specific forms. These codes, if present in the Unicode text, may not get converted properly as there is no concept of forced rendering in the IITM scheme.

Current assignment of Unicode for Indian scripts is restricted to the basic vowels, consonants and medial vowel forms and this allows for arbitrarily long samyuktakshars to be represented. The IITM scheme permits only a limited set of Samyuktakshars to be represented but this is a substantial number by itself. Thus there may be differences in rendering long Samyuktakshars.

The utility is a simple one written in ANSI C and will compile properly on most systems including MSwindows. Separate  utilities are to be used for conversion from llf or to llf. There is also a provision for converting between uch-2 and utf-8.

SDL, IIT Madras does not endorse the use of Unicode for Linguistic processing of text in Indian languages. The experiences of the lab in dealing with Unicode text are fully described in an independent section.

Download llf2unicode

Linux Binary llf2uni.tar.gz

Linux Binary  uni2llf.tar.gz

convert uch-2 to utf-8


