IITM Software: Editor for Urdu, Arabic and Hebrew
Text Editor for Scripts written right to left

  The multilingual software from IIT Madras includes a special version of the editor for languages such as Urdu, Arabic, Hebrew, and Avestan which are written right to left on a page. The version described here is a preliminary version with minimum capabilities for handling right to left text. The Editor serves to illustrate the principle of syllable level coding developed in the lab for writing systems which are based on syllables.

  The Editor will work with Microsoft Windows systems and does not require the presence of the Arabic or Hebrew Language kits. The editor is a standalone application and uses fonts developed at IIT Madras.

  Besides allowing text to be entered in these languages, the editor also supports data entry in Roman and will allow automatic transliteration of Urdu, Arabic or Hebrew text into Roman Diacritics. During data entry, the keyboard may be toggled from Urdu, Arabic or Hebrew to English and vice versa using a function key. Thus the editor is at least Bilingual and can support four scripts in any one document.

  The text prepared with the editor may be copied into Microsoft Word and other applications supporting the Rich Text Format. Thus applications such as email, creating web pages in Arabic etc., may be very easily handled by copying text into the standard applications.

  The speech enhanced version of the editor will allow visually handicapped persons use the editor. Presently the synthesized speech may sound flat lacking in intonation and with a foreign accent but this can be improved by using appropriate data bases with the MBROLA speech engine which is used in the application.

  Given below are some screen shots of the Editor in operation. For those familiar with the standard left to right scripts, the editor will appear a bit confusing to begin with. This is largely due to the direction of text entry and the use of the arrow keys. One will have to follow proper procedures for data entry when switching to English. A line of text may combine both English and the Semitic scripts.

 
  The version is still to be made bug free but it appears that people may use it to advantage as it is (as on Sep. 2002)

  The design of the editor has kept in mind the syllabic structure of text in the languages and the internal representation conforms to units of storage which map directly to syllables, thus effecting a meaningful approach to linguistic processing.

  The editor automatically adjusts the displayed letters to conform to the different shapes depending on the location of the letter (syllable) within a word. Besides the letters, a useful set of punctuation marks are supported. The entry of numerals requires that the user type in the digits in the reverse order.

  The editor does not support word processing features. This is not a limitation, for by copying the entered text into word, the required formatting of text can be accomplished. The editor saves the files in two independent formats, one conforming to the syllabic structure of the entered text and the other in the Rich Text Format. The RTF file may be imported into Windows applications as well.

  In the present version of the editor, the keyboard mapping used is based on a phonetic relationship with the keys on the standard ASCII keyboard. This mapping may seem a bit arbitrary but it is possible to accommodate other mappings as well without having to recompile the application. The figure below illustrates the mapping used in the editor. Though vowel marks may not be explicitly shown in Arabic and Hebrew text printed today, the editor allows these marks to be properly displayed.
 

  A few changes have been made to the above keyboard mapping and the trial version has the letters corresponding to M and B shifted to "[" and "]" respectively. 

Top

 
Features of the r2leditor.

  When invoked, either from a shell prompt or by clicking on its icon, the editor opens a window which may be resized to suit the users requirements. Urdu is retained as the default script. The cursor is positioned at the top right corner of the window. Data entry may begin on the first line.

  As data is entered, the text is rendered from right to left, conforming to the shapes of the consonants depending on their position in the word. A word ends when a space or a special punctuation mark is entered.

  Data entry in Roman is accomplished by using the Function key F9 as a toggle key. When typing data in English, the cursor will remain in the same position and the English string will move left as each letter is typed in. Arabic or Urdu may be continued beyond the English string by toggling F9 once more. The cursor will jump to the leftmost character of the English string and Arabic or Urdu letters will be rendered right to left one letter after another.

When a line contains text in Arabic as well as English, certain conventions are observed.

  • cursor will move left after each Arabic letter is entered. A carriage return will bring the cursor to the next line so that another line of text may be entered.
  • Toggling F9 in the middle of data entry will permit Roman letters to be typed in along with Arabic. The cursor will not move but the entered characters will. Thus characters will be inserted into the text when typing in English. When the ENTER key is pressed, the cursor would move to the next line leaving the English string in place even though the cursor was in the middle of the line.
  • The right and left arrow keys must be understood and used according to the logical function they perform.
  • Right arrow - move the cursor to the character that was entered following the current character. Left arrow - move the cursor to the character that was entered prior to the current character.
  • Movement of the cursor inside a string in Arabic will be the opposite of what a person used to typing in English will see. The cursor will move to the left when the right arrow is pressed, for the next Arabic character entered would be at the left. Users may want to experiment on what happens when the cursor is positioned with an Arabic character on its right and a Roman letter on its left.
  • It is recommended that as far as possible, a line of text should be in one script. If English must follow, then the Arabic text should be entered first followed by English on the same line but further Arabic text beyond the English string should be avoided. Text, where each line is in one script only, either Arabic or English, will be the best  choice.
  • Top

    Copying and pasting into Microsoft Word.

      Text entered using the r2leditor may be copied and pasted into Microsoft Word. Text alignment will be retained and further formatting may be attempted in Word. The screen shots shown below illustrate this.

    Top

    On screen transliteration.

      A string of text in Arabic may be automatically transliterated into Roman diacritics by blocking the string and invoking the language switch menu. Switching from Roman diacritics back to Arabic is not supported however.

    The following steps indicate how on-screen transliteration is done.

    • Step 1:  Type in the required Arabic text.
    • Step 2:   Make a copy of the text if you would like to show both    Arabic Text and below it the transliteration.
    • Step 3:   Select the text to be transliterated using the mouse, invoke the language Menu and select IPA. The selected line will change in script from Arabic to Roman diacritics.  The reverse operation will not work.


    Top


    What IIT Madras views as limitations in the present version of the r2leditor.
    1. The keyboard mapping is phonetic following a scheme which is typically suited to the sounds of Indian languages. It is possible to assign any appropriate mapping and thus honour existing keyboard mapping schemes for Urdu and Arabic. We do not yet know what  will be considered meaningful.

    2. In the current version, the text entered in Arabic or Urdu, is coded into syllables following the lexical ordering for Indian languages. The lexical ordering for Arabic (and Hebrew) are different. Hence the sorting utility supplied with the IITM software may not sort text as required. This is a problem which may be handled without much difficulty later.

    3. It is not guaranteed that the editor will correctly render Arabic or Urdu text consistent with the shapes used in traditional writing. The font used by the Editor is a truetype font which is freely available on the net. This font is meant for Urdu and hence the shapes for the vowel marks differ from the corresponding ones for Arabic. The "Hamza" is also not handled properly as of now. This may be a major limitation as far as Arabic is concerned.

    IIT Madras has a proposal to design a new font consistent with the requirements for both Arabic and Urdu.

    4. The scheme of transliteration used in the current version of the Editor may not present the correct equivalents. We are still looking for schemes acceptable to all in respect of Roman transliteration for Arabic and Urdu. We have seen many different schemes such as "Qalam Arabic Transliteration" which seem to be popular. The scheme used in the examples above corresponds to the ArabTeX specification. It will be possible to accommodate almost any desired scheme so long as the diacritics used are chosen from the standard set as per conventions used in the past (e.g., books printed in the 19th century). The online transliteration feature is included only to show that it can be done with the editor.

    Top
     

    Text Editor for Right to Left Scripts

        About the editor

        Editor features

        On-Screen Transliteration

        Present Limitations


    Sample Documentaion

      Introduction

      Manuals 
      Part-1
      Part-2 
      Part-3

      Bugs in the current version

      Urdu Writing
      (A short tutorial)
     
     



    Distribution of the Editor

    Trial version of the editor is available for evaluation. Those interested in getting a copy may send a request to the lab at the address given in the contact IIT pages.

    (It is necessary to mention here that there are still many bugs to be fixed in the editor. Though Arabic, Urdu and Roman may be typed in easily, unpredictable results will be seen if one tries to edit an already prepared file containing bilingual data. However, the editor can be used to prepare text for pasting into Word.)