have been created by indexing the text of Kural, sorting the words in the
required dictionary order and then splitting them into individual files.
The text of Kural used for data entry has a version where words are split
according to established conventions. This being the case, many of the
words may have starting letters which are really part of the previous word.
No effort has been made to identify such situations. Typically, many words
starting with "nna", "ya", "zha", "lla" "rra" and "na(*)" may not be proper
words, but one can discern the word correctly with some experience.
The main reason for
this exercise is to show the fine text processing capabilities of the IITM
software where concordances and indexing can be accomplished. Wordlists
are always of interest to Linguistic scholars and perhaps for the first
time, the wordlist for Kural is being made available in Tamil script on
the web. One would have seen many lists giving only the first words of
the couplets but here nearly 6000 words have been included.
The viewer may want
to do a frequency analysis of the words. This could be done for instance
by writing a utility in PERL. We have done this and a separate page is
available giving the results
of the analysis.
Select the starting letter
on the link shown below the letter of interest