Software for the Visually Handicapped

Speech Synthesis and Text to Speech output in Indian Languages.

    The problem of text to speech output has received considerable attention in recent years.  Several good solutions are available using hardware as well as software means.  The phonetic nature of Indian languages allows phoneme based speech synthesis to be effected with sufficient clarity. A simple concatenation of phoneme sounds is surprisingly effective though lacking in intonation. 

   The IITM software uses syllable level representation of the text and each syllable directly translates into a sound that can be synthesized or simply played from a prerecorded piece of audio. 

   To begin with, the IITM development team has experimented with the MBROLA speech synthesizer software and has found that clear speech output may be obtained using this phoneme and diphone based approach to speech synthesis.  The MBROLA system produces synthesized speech from a representation of text known as the .pho format.  A text to speech system would use some algorithm to convert the text into a .pho file which is fed to the synthesizer.  the .pho file is a highly compressed representation of the sound output and is remarkably compact in size, being a very efficient compression mechanism as well! 

  The IIT Madras software has a utility to convert the local language representation (.llf) to a .pho file by a simple table look up. Therefore speech output may be obtained on the fly efficiently and quickly. Mbrola supports multilingual speech output by allowing programs to dynamically choose the diphone data base for the language. Unfortunately the databases required for Indian languages are not yet available for use with Mbrola. Hence the IITM team had experimented with other available data bases where the phonemes are close to the phonemes of Indian languages. We have found that the phonemes of Swedish are well suited to produce speech output in Indian Languages. English and American voices do not quite admit of the pronunciation required for the Indian Aksharas. 

  You can get an idea of the quality of  synthesized speech output by hearing the audio clip of the passage shown below.  There are three languages in the text:  Tamil,Telugu and Hindi. The audio is in the real-audio format.

The synthesized speech output  corresponding to the above

Real Audio Format       mp3 format

  You may also download the .pho file corresponding to this output and play it yourself on your system if you have MBROLA installed. The .pho files are highly compressed representations of the required sound output.  You will have to use the Swedish voice data base (SW1) when playing  the downloaded .pho file. 

  The development team at IIT Madras has made available several useful applications in Indian languages which permit speech output as a part of the user interface. These also include standard applications in English which run in text mode under DOS as well as Linux. Many of the applications in English have also been enabled to run under Windows9X/2000/XP and screen reading functions supported by JAWS for DOS which has been adpated to work on a PC without the need for an external synthesizer.

  This approach permits virtually any text based internet application (typically those running under Unix) to also work with Windows through the use of porting tools from Unix to Windows provided by Cygwin. IIT Madras has developed the interface for JAWS for DOS to work with a sound card on  Win9X/2000/XP systems and this opens up several possibilities for the Visually handicapped to learn to use computers. Many of the applications support Indian language based user interfaces and these also offer text to speech  capabilities.

  Given below is a list of useful applications for the Visually handicapped, developed at the Systems Development Laboratory, IIT Madras. Among these, applications which are English based, are already available for use under the UNIX platform but have been ported to the Windows environment and enhanced to work with the sound hardware on a PC. 

1. The Multilingual Editor with speech enhancements.
2. A sound enhanced Web Browser Based on Lynx.
3. PC Pine working with JAWS for DOS.
4. A utility to speak out text from a file prepared using the IITM software.
5. Jaws for Dos without an external synthesizer.

  All these applications are distributed free for the benefit of the visually handicapped in the Southeast Asian region. The applications themselves are discussed in separate pages at this web site.

   It must be noted that MBROLA is a superb piece of software providing high quality voice output. As of now, we have not generated a special Indian language data base for use with MBROLA but have used one of the existing voice data bases (Swedish). With suitable Indian languages data bases for Mbrola, the quality of the speech output can come very close to natural speech and the quality of the present speech output may be improved to sound more like how Indians speak! This work is being taken up and will be completed in due course. IIT Madras would be very happy to interact with other teams working in the area of speech synthesis using Mbrola.


About Speech Synthesis

(An introduction to text to speech synthesis by the designer of MBROLA)



Multilingual applications enhanced with Speech output

1. The Multilingual Editor

2. Speak text from a .llf file.

3. Sound Enhanced Lynx
   (text based web browser)

4. Jaws for Dos
    (Screen Reading Software
     adapted to work on a PC
     with a sound card)

The applications mentioned above are discussed in the section on Software for the Disabled.



