David Březina

Character complexity and redundancy in writing systems over human history


Changizi, M. A., & Shimojo, S. (2005). Character complexity and redundancy in writing systems over human history. Proceedings. Biological sciences / The Royal Society, 272, 267-275.


The idea of universal grammar, usually attributed to Noam Chomsky, proposes that certain structural principles of languages are motivated by the limits of our genetic predispositions and are therefore shared universally across human languages. If languages share common principles, what about human writing?

Mark A. Changizi and Shinsuke Shimojo make an inquiry into the topological complexity of scripts (they use the term writing systems to mean the same) looking for regularity in the number of strokes and stroke types that are used to construct character shapes. They analyse more than a hundred of the world’s scripts and through statistical analysis conclude that the average number of strokes required to construct a character (they refer to a character length) is approximately three, regardless of the script or the number of characters in the script’s repertoire. The reported average is 2.91, with a standard error of 0.09. Similarly, the authors report that redundancy, i.e. the proportion of stroke combinations that are considered valid characters in a script out of all stroke combinations that are theoretically possible, averages around 50%. Neither the average stroke count nor redundancy vary much as a function of the size of the script repertoire.

Note that the number of stroke types tends to increase with the size of a script’s repertoire. In other words, in order to produce and handle larger repertoires of characters, humans add new types of strokes rather than construct characters from a larger number of strokes.

Unfortunately, the description of the methodology in the paper is not sufficiently detailed, perhaps due to limits imposed by the publication. Consequently, the approach to script and character design comes across as naïve in some places. The visual presentation of the scripts is limited to only a few characters from each which makes it difficult to review. Here are a few considerations that could have been addressed:

Illustration of the method for determining the number of strokes in a character. Figure taken from the reviewed paper.

Figure 1: Illustration of the method for determining the number of strokes in a character. Figure taken from the reviewed paper.

Admittedly, critical appreciation of scripts’ design becomes a gargantuan task when dealing with over one hundred of the world’s scripts, but with all these omissions can we still consider the data representative of the world’s scripts?

There are 22 numeric scripts studied next to the 93 non-numeric scripts. The reported average number of strokes in a character is 1.95 (SEM=0.14) for the numeric scripts. This shows that characters in numeric scripts tend to be topologically simpler. By itself, this is an interesting result.

Considering the ease of reading to be the principle selective pressure on scripts, the conclusion puts forward several explanations for the surprising cross-script constants (numeric or not).These range from limits of short term memory and principles of character recognition to a visual-ecological explanation that characters’ topology match those found in objects in natural scenes which is explored further in a more recent paper (Changizi et al., 2006).

Universal grammar in languages provides grounds for the belief that people from distant parts of the planet are indeed quite similar, that they can ultimately understand each other, and learn each other’s languages. Despite the drawbacks regarding its methodology, the paper shows an exciting way to use statistical analysis to gain general insights regarding the way humans read and write.

Enjoyed the article?

Sign up for our newsletter and get notified when we publish the next one.


Changizi, M. A., Zhang, Q., Ye, H., & Shimojo, S. (2006). The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. The American Naturalist, 167, E117-39.

Daniels, P. T., & Bright, W. (1996). The World’s Writing Systems. Oxford University Press.

More reading

See all articles