software applications developed at IIT Madras are built around a text representation
which corresponds to syllables (aksharas). This coding scheme preserves
the lexicographic ordering of the aksharas and conforms to the ordering
specified for the vowels and consonants. The sort.pl is a PERL program
which may be used to sort text in a .llf file. Such a file could
be prepared using the Multilingual Editor or may even be generated electronically
by indexing text files in Indian scripts.
The utility may be
used to sort the words in the input file and generate a sorted version
as output. It is assumes that the input file is made up of lines of text
where each line has one or more words. The sorting is always from the first
akshara of the first word. It is required that each line should start with
an akshara and not a blank space for the sorting to work properly.
To run the utility
PERL must be available in the system. PERL is available for free and can
be downloaded and installed on Microsoft Windows or Linux systems. The
utility is invoked as
prompt> perl sort.pl <input
file> <output file>
where "prompt" is
the command shell prompt. The utility is thus to be invoked only from a
command shell and does not possess a graphical user interface. The input
file and the output file names will have to be explicitly specified with
the .llf extensions.
A sample input file
containing words in Tamil is available for checking out the utility. This
file has been prepared using the multilingual editor.
isues specific to Indian languages/scripts
There is some confusion
about the correct lexicographic ordering of the vowels as seen from the
ordering given in different dictionaries.
In some cases, the
two support vowels "am" and "aha" are placed before the first vowel "a".
The generic consonant (a consonant without a vowel) is sometimes placed
ahead of its combinations with the vowels. As to which scheme is correct
is a debatable issue. The IITM software places the support vowels at the
end. Please see the ordering shown below.