Sorting Utility
(PERL application to sort text)
  The multilingual software applications developed at IIT Madras are built around a text representation which corresponds to syllables (aksharas). This coding scheme preserves the lexicographic ordering of the aksharas and conforms to the ordering specified for the vowels and consonants. The is a PERL program which may be used to sort text in a  .llf file. Such a file could be prepared using the Multilingual Editor or may even be generated electronically by indexing text files in Indian scripts. 

  The utility may be used to sort the words in the input file and generate a sorted version as output. It is assumes that the input file is made up of lines of text where each line has one or more words. The sorting is always from the first akshara of the first word. It is required that each line should start with an akshara and not a blank space for the sorting to work properly.

  To run the utility PERL must be available in the system. PERL is available for free and can be downloaded and installed on Microsoft Windows or Linux systems. The utility is invoked as

prompt> perl <input file>  <output file>

  where "prompt" is the command shell prompt. The utility is thus to be invoked only from a command shell and does not possess a graphical user interface. The input file and the output file names will have to be explicitly specified with the .llf extensions.

  A sample input file containing words in Tamil is available for checking out the utility. This file has been prepared using the multilingual editor.

Sorting isues specific to Indian languages/scripts

  There is some confusion about the correct lexicographic ordering of the vowels as seen from the ordering given in different dictionaries.

  In some cases, the two support vowels "am" and "aha" are placed before the first vowel "a". The generic consonant (a consonant without a vowel) is sometimes placed ahead of its combinations with the vowels. As to which scheme is correct is a debatable issue. The IITM software places the support vowels at the end.  Please see the ordering shown below.

  The sort utility can help handle two different ways of ordering. Under normal operation the generic consonant comes at the end after its combinations with the vowels. By default the sort utility will assume this ordering and produce a corresponding output. If the generic consonant should be placed before (i.e., ahead of all the combinations with vowels) then the sort utility may be invoked with a special parameter to achieve this.

prompt> perl  <input flle>  <output file> [rotate]

  The rotate directive tells the utility that the sorting must begin with the generic consonant and not its combination with "a". (Note: The square brackets should not be typed in with the command).

Points to remember

1. This utility may be used to sort words prepared using the IITM software applications such as the multilingual editor. It cannot be used as a general sort utility for plain ASCII text

2. You will require PERL to be available on your system to run the utility. The utility will work on Microsoft Windows systems as well as Linux.



