Essay
David Březina

Elements of multi-script typography: paragraphs and pixels

Previous chapter: codes, keys, and word shapes

Next chapter: coming later this year

From word shapes to paragraphs

During paragraph composition, words are set one next to the other in the writing direction (see Figure 1) and separated by word separators (typically a word space) and other punctuation to form lines (see Figure 2). Note that some scripts do not have established conventions for the use of word separator and punctuation, e.g. Chinese (Daniels & Bright, 1996). Following the customary order of lines on a page (see Figure 1) a paragraph is formed. At the end, the paragraph may be separated by a selected paragraph separator (end of line, blank line, indent of the following line).

Examples of common contemporary writing directions

Figure 1: Examples of common contemporary writing directions (in black, the general order of the characters in a word and on a line) and paragraph directions (in red, the order of lines in a paragraph). 1.: left-to-right line and top-to-bottom paragraph direction are used, for example, for English and many languages using the Latin script or for horizontal setting of Chinese. 2.: right-to-left line and top-to-bottom paragraph direction are used for languages using the Arabic script or Hebrew. 3.: top-to-bottom line and right-to-left paragraph direction are used for vertical setting of Chinese or Japanese. 4.: top-to-bottom line and left-to-right paragraph direction are used, for example, for Mongolian. Rare and historical directionalities, such as boustrophedon, are omitted.

The use of a danda and double danda signs

Figure 2: The use of a danda (single vertical line) and double danda (two vertical lines) signs as punctuation in Sanskrit texts set in the Devanagari script. Note that Latin-script punctuation is usually used for contemporary languages such as Hindi, Marathi, or Nepali when set in the Devanagari script. The example is from Śrī Bhagavad Sandarbha (Gosvāmī, 2014, p. 7). The font is Skolar Devanagari Regular.

Paragraph alignment creates a major technical challenge. When focused on the horizontal direction, there are four common types of paragraph alignment: aligned-left (also called ragged-right), aligned-right (also called ragged-left), horizontally-centred, and horizontally-justified. The centred alignment centres the lines causing ragged edges on both sides of the paragraph. When aligned-right or aligned-left, the lines get aligned along one edge while leaving the other edge ragged. When horizontally justified the lines get aligned along both edges. Alignment in the vertical direction is analogous. See Figure 3.

Common paragraph alignments

Figure 3: Common paragraph alignments, horizontal: aligned-left (1), aligned-right (2), centred (3), and horizontally justified (4); and vertical: top-aligned (5), bottom-aligned (6), vertically centred (7) and vertically justified (8). The bottom-aligned (6) and vertically centred (7) alignments are used sporadically.

Note that the direction of the paragraph alignment is independent of the writing direction. However, scripts’ writing directions and document genres have established preferences for particular paragraph alignments.

Typographers use various techniques to achieve full justification or to reduce the raggedness of the lines’ edges. Here are some of the common ones that are used to extend or shorten lines to fit the intended width, i.e. to justify them:

To achieve an aesthetically pleasing paragraph setting, you might need to employ several of these techniques at the same time.

Not all of these techniques are available to all scripts. Most notably the insertion of additional space between characters would break the links in connecting scripts such as Arabic or Devanagari. These use special extending glyphs which make words and lines longer while preserving the connections. These additional spaces or extending glyphs can be inserted only in appropriate spaces. For alphabetic scripts, such as Latin, Cyrillic, or Greek, additional spaces or extending glyphs can be inserted between any two letters. For syllabic scripts, such as Devanagari or Tamil, they can be inserted only between syllables See Figure 4.

Examples of appropriate and inappropriate justification methods

Figure 4: Example words set in (top to bottom): Russian (in the Cyrillic script), Greek (in the Greek script), English (in the Latin script), Arabic (in the Arabic script), and Hindi (in the Devanagari script): default setting (1), tracked with additional space (2), and stretched using extending glyph (in red) (3). Note that the extending glyph used in the Arabic script (called kashida or tatweel) is not applied uniformly. Also note that the additional space and the extending glyphs in Devanagari are added between syllables rather than individual glyphs. Insertion of an additional space is not an acceptable technique for the Arabic and Devanagari scripts as it breaks the links between characters. On the other hand, applying the extending glyph is generally not used for the Cyrillic, Greek, or Latin scripts. It is shown here only to further illustrate the absurdity of an inappropriate tracking/stretching technique. The fonts used are Skolar Sans PE Extended Extrabold (the first three lines), Marlik Extrabold, and Ek Mukta Extrabold.

Where words are hyphenated depends on the conventions of a particular language and the document genre. Arabic or Persian do not hyphenate words, for example, while Uyghur, that also uses the Arabic script, does (Haralambous, 2021).

To do hyphenation well, the paragraph composer needs to have access to a hyphenation dictionary which defines when and where words can be hyphenated. Contemporary typesetting software offers ways to select a language for a paragraph which, among other things, also sets the correct hyphenation dictionary. If missing, the hyphenation dictionary can be installed.

Line spacing (or leading) can be fully controlled by the designer in most contemporary typesetting software. When working automatically, the paragraph composer may set the line height to accommodate the font with the tallest vertical metric, i.e. the height of the glyph boundaries. This is convenient when mixing multiple scripts with different use of vertical space, but it may lead to uneven line spacing (see Figure 5).

The third line in this example gets automatically shifted
lower

Figure 5: The third line in this example gets automatically shifted lower to accommodate the predefined vertical metrics of the Devanagari and Arabic fonts, although the particular words would fit well. The fonts used are Skolar PE Regular, Skolar Devanagari Regular, and Nassim Arabic Regular.

Paragraphs that mix scripts with different reading directions create further challenges. Bi-directional setting is especially common in semitic languages such as Arabic or Hebrew that run from right to left, but often use Latin-script numerals and text snippets, e.g. email or web addresses. The composer changes the direction and position of the characters as users type (see Figure 6). This becomes crucial in word processors and online forms.

Inputting Latin-script text within Arabic

Figure 6: Typing an email address in the Latin script within an Arabic-script context may seem counter-intuitive to users who are not familiar with bi-directional type setting. Notice that the "@" and "." get positioned initially on the left, following the Arabic-script direction. They are moved to the right only after the following Latin-script letter is typed and it is clear they form a part of the email address that is treated as a whole. Note that the support for bi-directional content varies across software.

Hyphenation brings about another potentially confusing challenge where a word hyphenated in one direction continues on the next line, following the paragraph direction of the main script (see Figure 7).

Hyphenation of a Latin text within Arabic paragraph

Figure 7: The Arabic text is running from right to left, and since the rest of the line is too short for the complete Latin term, the word "TAXIFOLIA" has to be hyphenated. The remainder of the hyphenated word is placed at the beginning of the line, i.e. to the right, but aligned from left to right. The right part shows a schematic flow of the text; the flow of the Latin word is marked in red dashed arrows. Note that only the Latin text can be hyphenated as the Arabic language does not use hyphenation. From the newspaper Asharq Al-Awsat (2000, p. 15), supplied by Fiona Ross.

From contours to pixels

The majority of contemporary fonts use beziér curves to describe glyph contours. In order to display geometrically defined smooth contours on contemporary screens and printers, the contours need to be rasterised, i.e. converted to their visual representation in pixels. The results can differ greatly across computer platforms and printers. The technology used for rasterisation is principally script-independent, but it still requires consideration.

When dealing with visually dense scripts such as Devanagari, Chinese, or the Japanese scripts (Kanji, Hiragana, Katakana), it is worth paying attention to the quality of the rasterised image to ensure that important visual features are well preserved in low resolution (see Figure 8).

Designers simplify complex glyphs of Japanese Kanji

Figure 8: Designers simplify complex glyphs of Japanese Kanji at small sizes; the top row shows samples of the same glyph in the font Meiryo in three different sizes. The bottom row shows the same character in three calligraphic styles where the simplification is done for aesthetic reasons. Illustration from Larson (2007).

Depending on the rasterisation technique and the font used, there may be a marked difference between the overall weight of whole categories of shapes. For example, weight differences between round and straight strokes may become an issue when juxtaposing a script that uses a lot of straight strokes, such as Latin, with a script that uses more rounded strokes, such as Greek. The latter may turn out noticeably darker in smaller sizes even though it looks balanced in larger sizes. If the intention is for the two to look equally salient, the designer may need to implement a visual compensation: use a lighter or darker weight for one of the scripts.

Since the last quarter of the last century, the technology of digital typography has made considerable progress in supporting many world’s scripts. Yet, it is still not perfect to the point where it would always work smoothly without a designer’s intervention. Designers need to be aware of all the principles, limitations, and issues that come with each script and language to ensure quality in their digital work.

Notes

What did you think?

What did you think of the article? We would sincerely appreciate your feedback.

Send a comment

Enjoyed the article?

Sign up for our newsletter and get notified when we publish the next one.

References

Daniels, P. T., & Bright, W. (1996). The world’s writing systems. Oxford University Press.

Elyaakoubi, M., & Lazrek, A. (2010) Justify just or just justify. The Journal of Electronic Publishing. 13(1). http://dx.doi.org/10.3998/3336451.0013.105

Gosvāmī, J. (2014) Śrī Bhagavad Sandarbha. Jiva Institute. The digital version available from http://sandarbhas.jiva.org

Haralambous, Y. (2021). Breaking Arabic: the creative inventiveness of Uyghur script reforms. Design Regression. Retrieved 17 January 2022, from /article/breaking-arabic.

Larson, K. (2007). The technology of text. IEEE spectrum, 44(5), 26–31. The digital version available from https://doi.org/10.1109/MSPEC.2007.352529

Nemeth, T. (2020). On Arabic justification. The Journal of Electronic Publishing, 23(1). https://doi.org/10.3998/3336451.0023.104

The Unicode Standard (Version 14.0). (2021). The Unicode Consortium. The most recent version is available from http://unicode.org

More reading

See all articles