acharya logo
image
image
image
image
image
image
image
 
Home --> Software Design Issues --> Fonts for Indian Languages.
Search  
 
A tutorial on Fonts for Indian languages.

  This presentation is a tutorial on Fonts for Indian languages and scripts. Over the years, many different fonts have been introduced in the context of data entry, display and printing of documents in Indian scripts. Each font is associated with a specific data entry scheme recommended by the designer and the software is often platform specific. Also, each designer has put into the fonts glyphs which are adequate for simple publications. The idea behind this tutorial is to allow our viewers get an insight into the vagaries of fonts as well as the attempt by IIT Madras to provide a minimum level of standardization in designing fonts for the scripts.

  This tutorial covers the following topics.


A set of  fonts (for 11 scripts) with compliments from IIT Madras
(Bengali, Devanagari, Gurmukhi, Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Roman Diacritics and Urdu.

The set of fonts is provided in these formats.
TrueType, PostScript (pfb), Unix (BDF) and MacIntosh Truetype.

Points to remember

Fonts for Indian languages cannot be based on any standard encoding specific to the syllabic writing systems followed. Such encodings do not exist.

Font encoding makes sense only when one character code maps into one glyph. For writing systems which are based on syllables, a character string making up a syllable has to be mapped into a single shape corresponding to that of the syllable. Hence, Fonts traditionally designed for Indian languages, just include a collection of basic shapes for the vowels, consonants and other ligatures used in the writing system. The application has the responsibility to figure out what glyphs should be combined to generate the shape for a syllable. Thus the concept of the character set does not apply.

Virtually no standardization is possible in Indian language fonts because variations in the representation of a syllable are permitted.

The concept of the font
  In simple terms, a font provides for displaying a set of symbols through well defined shapes for each symbol. The symbol is a generic concept and the font is an instance of specific representation of a set of symbols.  Traditionally, the symbols mentioned here have been the letters of the alphabet in a particular language along with punctuation marks and special characters. Fonts used to be created by craftsmen and artists during the days of printing machines that used movable type faces. Today, fonts are created by artists and designers who work with computer based tools. 

   In a font, the specific shape for a symbol is described either in terms of a digital image through bit maps or in terms of a filled outline. The former is called a bit mapped font and the latter, an outline font. An outline font specifies the shape for a symbol in mathematical terms using curves. The mathematical description allows the shape to be drawn at different sizes by scaling the parameters suitably. Outline fonts are increasingly being used on account of their scalability. The descriptions result in a pictorial representation or shape for each symbol, which is referred to as a glyph. 

  The number of symbols which are displayable by a font is generally limited by the value of the index used to access a specific shape within the font. Eight bit fonts are limited to 256 glyphs but in computer systems recognize only a subset which usually ranges between 96 and 240 (approximately). 

  Given below is a table displaying the glyphs present in the familiar Times-New-Roman font. Notice that the letters of the Roman alphabet along with special characters and punctuation marks are present as glyphs in the font. The glyphs shown here occupy positions from 32 to 255. The first 32 are not assigned in most fonts. 

image
 Each glyph in the font is specified by a name as well as a glyph index which locates the glyph in an ordered arrangement of the glyphs. In the fonts for the Roman alphabet, it is common to locate the glyph for a letter in the place corresponding to the ASCII code of the letter. Thus, the glyph for "capital b" i.e., "B" will be in the sixty sixth position (the locations are numbered from zero).  Most of the frequently used glyphs are seen in the first 128 locations of the font.  The second half, known as the upper ASCII, usually contains special symbols which are required in printed text to indicate phonetic aspects of the letter or a reference symbol for footnotes etc.. 

  In most computer systems, a provision is available to display text using a font that may be selected by the user. The text to be displayed is represented through the ASCII codes of the characters to be shown. These codes may span the range 32-126, the usual set of values for the letters of the alphabet, or the range 160-254 normally reserved for special symbols. 

  There is no specific recommendation available on what symbols should get displayed via the upper ASCII range though the International Standards organization has recommended that the glyphs for some of the languages of Europe and the Middle East be assigned these locations. The term character set, is often used to refer to the set of numeric codes assigned to the letters of the alphabet of a language. Thus, for a specified language, the code assigned to a letter of the alphabet will be the same in all computers so that application programs may recognize the letter from its internal code. If the glyph location for that letter also coincides with this code, then a one to one relationship exists between the code for a character and its glyph. For most European languages, one letter invariably gets represented through one glyph. 

  The fixing of glyph locations for a letter of the alphabet has the most important advantage that the text to be displayed may be shown using many different fonts. This is precisely the idea behind word processors permitting selected text to be displayed in a font chosen by the user. 

  As of today, fonts for most of the languages of the world are limited to 8 bit codes for specifying the glyph positions. In other words, the number of symbols (or glyphs) required to display text in most languages is less than 256 and hence 8 bit fonts work well. Almost all the languages of Asia, Japan, China and Korea cannot be specified through 8 bit codes for their letters, as there are far too many of them. The Japanese character set includes some 24000 symbols while most of the scripts of India provide for as many as 12000-14000 individually differing aksharas.  Later we will see how the aksharas of Indian languages may still be handled using 8 bit fonts, i.e., fonts supporting only up to 256 glyphs. 

Back to Contents


 
 
 

A font consists of a set of Glyphs which are arranged in some order inside the font. It is customary to view this order in the form of a rectangular grid.

A character in a text string is displayed by using its code (often plain ASCII) to index into the array and selecting the shape. However the approach taken in practice is that the code is related to a character from a character set and the character identified by its name in the set. 

The name is used to identify the location of the glyph with the required shape. This approach allows the designer some freedom in designing the glyphs without having to worry about the different character sets that the same glyph may be associated with.

Acharya Logo
Text in Brahmi script at the Gate of the Great Stupa at Sanchi. The text records the donation of the pillar by a desciple of Arya Kshudra. The text reads "aya chuDasa atevAsino balamitasa dAnam thabho". More information about the Brahmi script is presented under Languages and Scripts.

Today is May. 29, 2017
Local Time: 19 20 11

| Home | Design issues | Online Resources | Learn Sanskrit | Writing Systems | Fonts |
| Downloads | Unicode, ISCII | SW for the Disabled | Linguistics | Contact us |
Last updated on 10/25/12     Best viewed at 800x600 or better