Examining the research methods used by legibility legends Tinker and Paterson
In a previous article I described revisiting research on line length, having come across some interesting findings which didn’t fit with a commonly held view. It seems that long line lengths may not be as problematic as we thought. Going back to research by Tinker and Paterson, questions were raised over the validity of their methods. This article takes a closer and critical look at what they did as their work is frequently cited in educational design literature and by other legibility researchers.
Motivation
Why am I so concerned about the studies conducted by Tinker and Paterson? Because until I read about a flaw in their method (Parker, 2019; Parker et al., 2019), I had considered their work to be a solid foundation for subsequent legibility research. Tinker is described as “an internationally recognized authority on legibility of print” (Tinker, 1963, p. iv). Having been alerted to this potential problem, I sought out the original articles, realising that I had previously been over-reliant on the summaries (e.g. Tinker, 1963; 1965). I also looked to see if other researchers and writers had critiqued their method.
Test material
Essentially the same experimental design was used throughout all Tinker and Paterson’s studies measuring speed of reading. A standardized test provided the reading material: the Chapman-Cook speed of reading test. The test involves reading paragraphs of two sentences and underlining the word which spoils the meaning. The number of paragraphs read in 1 minute and 45 seconds is measured. Sets of paragraphs are printed as individual forms and each form has different content, i.e. a unique set of paragraphs. See Figure 1 for an example of one form (Form B).
Tinker and Paterson (1929): an example of their experimental method
Their first published study into line length reviewed the existing results from experiments and the opinions of advertisers to conclude that “the problem of optimal line length is in a very unsatisfactory state” (Tinker & Paterson, 1929, pp. 209–210). In this study, the paragraphs were printed in 10pt Scotch Roman, varying the line length from 59 mm to 186 mm, set solid. They compared line lengths by using a standard: a line length of 80 mm. See Figure 2 for a simulation of part of the two forms. Each pair (standard and test line length) was tested on a different group of 80 university students adding up to a total of 560 participants, across 7 comparisons (see Table 1).
They stated that in all 7 groups, the line length of 80 mm was read faster than the alternative, only one of which was shorter (59 mm); the rest were longer than 80 mm. All differences except one (80 mm versus 97 mm) were reported as statistically significant.
Test form | Standard line length | Test form | Test line length | |
---|---|---|---|---|
Group 1 | Form A | 80 mm | Form B | 59 mm |
Group 2 | Form A | 80 mm | Form B | 97 mm |
Group 3 | Form A | 80 mm | Form B | 114 mm |
Group 4 | Form A | 80 mm | Form B | 136 mm |
Group 5 | Form A | 80 mm | Form B | 152 mm |
Group 6 | Form A | 80 mm | Form B | 168 mm |
Group 7 | Form A | 80 mm | Form B | 186 mm |
The flaw
As mentioned above, each form of the Chapman-Cook test (Form A and Form B) has different content, i.e. a different set of paragraphs. As Table 1 shows, in each group Form A is used for the standard (line length of 80 mm) and Form B for the alternative, the test line length. Material set in each pair of line lengths was read by a different group of participants who always read Form B after Form A and the test line length always followed the standard. This means that the content of each form is confounded with line length. Faster reading of the 80 mm line length could be attributed to Form A having paragraphs that are easier to read than those of Form B. Moreover, the effect of practice is not controlled.
Their control condition
Recognising this potential problem, Tinker and Paterson’s solution was to include what they considered to be a control condition where one group receives Forms A and B set in the same line length (80 mm). Any difference due to practice or lack of equivalence of content is then used as a “correction factor” (Paterson & Tinker 1940, p. 42) that is applied to correct for potential effects of the order and content of the forms. (See Table 2 for an example of how the correction is applied). This correction was considered adequate to conclude that a difference in reading speed could be attributed to the typographical arrangement, rather than the test procedure.
Group (1) | Test form (2) | Typeface (3) | Average number of paragraphs read (4) |
Difference between A and B (5) |
Corrected difference between A and B (6) |
---|---|---|---|---|---|
1 | A | Scotch Roman | 19.1 | 0.71 | 0 |
B | Scotch Roman | 18.39 | |||
2 | A | Scotch Roman | 18.78 | 0.64 | -0.07 |
B | Garamont | 18.14 | |||
3 | A | Scotch Roman | 18.81 | 0.74 | 0.03 |
B | Antique | 18.07 | |||
4 | A | Scotch Roman | 19.42 | 0.91 | 0.2 |
B | Bodoni | 18.51 | |||
5 | A | Scotch Roman | 19.03 | 0.91 | 0.2 |
B | Old Style | 18.12 | |||
6 | A | Scotch Roman | 18.43 | 0.94 | 0.23 |
B | Caslon O.S. | 17.49 | |||
7 | A | Scotch Roman | 19.06 | 1.13 | 0.42 |
B | Kabel Lite | 17.93 |
No control group was used in the 1929 line length study which explains why there are more significant differences described in the 1929 paper than summarised in the subsequent book (Paterson & Tinker, 1940, p. 42) where the correction factor is applied (reducing the size of the differences). In the 1940 publication, variations in line length between 80 mm and 152 mm (i.e. 80, 97, 114, 136, 152 mm) are said to have little or no effect on speed of reading. An unusually short line (59 mm) may slow reading. At line lengths of 168 and 186 mm “the evidence is clear that reading speed is significantly retarded” (Paterson & Tinker, 1940, p. 43). Here is a statement that long lines slow down reading. See Table 3 for both the uncorrected and corrected data.
This control group technique is questionable because the correction factor is obtained from one group of readers and then applied to different groups of readers. Even if the experiment conditions are the same for all groups, the variability among readers may mean that the correction factor is not appropriate for them. It is perfectly valid to compare different groups of readers, known as between subject comparisons, but the data do not seem to be analysed in this manner.
Their response to criticism
Tinker and Paterson were aware of their “methodological difficulties” and published a paper on methodological considerations (Tinker & Paterson, 1936). This paper describes a series of “special experiments” they conducted to address all the issues they could think of. These include discussion of the control condition and differences among participant groups. They explore the differences and decide they are within suitable limits (Tinker & Paterson, 1936, p. 135). This is questionable (see note 5). Subsequently, Paterson and Tinker (1940, p. 188) acknowledge that “Some critics might believe that these differences would affect the typographical comparisons involved in any one study”. I believe that this is a possibility; I am unconvinced by the data they use to support their argument that the variations in the average scores of the 7 groups do not affect their results. But we do not know how the results might be affected.
Group (1) | Form (2) | Line length (3) | Average number of paragraphs read (4) |
Difference between A and B (5) |
Corrected difference between A and B (6) |
Percent difference after correction (7) |
---|---|---|---|---|---|---|
1 | A | 80 mm | 18.31 | 1.25 | 0.75 | -4.1 |
B | 59 mm | 17.06 | ||||
2 | A | 80 mm | 18.46 | 0.5 | 0 | 0 |
B | 97 mm | 17.96 | ||||
3 | A | 80 mm | 18.19 | 0.96 | 0.46 | -2.5 |
B | 114 mm | 17.23 | ||||
4 | A | 80 mm | 18.98 | 0.93 | 0.43 | -2.3 |
B | 136 mm | 18.05 | ||||
5 | A | 80 mm | 18.94 | 1.14 | 0.64 | -3.4 |
B | 152 mm | 17.8 | ||||
6 | A | 80 mm | 18.88 | 1.47 | 0.97 | -5.1 |
B | 168 mm | 17.41 | ||||
7 | A | 80 mm | 18.31 | 1.88 | 1.38 | -7.5 |
B | 186 mm | 16.43 |
A better solution to the flaw in their method
Tinker and Paterson spent considerable effort justifying their methods rather than addressing the problem by counterbalancing the conditions across different groups of participants. Counterbalancing avoids confounding the content of the forms with line length and balances out effects of practice. As a consequence, the results will have greater validity because they are measuring the effects of line length and not other unintended factors.
To counterbalance the conditions:
- the standard and test line lengths are both paired with Form A and B (to address the confound)
- Form A is read first and second (balancing practice effects).
Instead of using their single configuration:
Group | 1st form | Line length | 2nd form | Line length |
---|---|---|---|---|
Group 1 | Form A | Standard line length | Form B | Test line length |
three other configurations are added to create a balanced design, with different groups of participants assigned to each configuration:
Group | 1st form | Line length | 2nd form | Line length |
---|---|---|---|---|
Group 1a | Form A | Standard line length | Form B | Test line length |
Group 1b | Form B | Standard line length | Form A | Test line length |
Group 1c | Form A | Test line length | Form B | Standard line length |
Group 1d | Form B | Test line length | Form A | Standard line length |
As I tried to work out possible reasons why Tinker and Paterson hadn’t used this experimental design, I was surprised to find that they had. In an earlier paper (Tinker & Paterson, 1928) they explain why they use this design, which they describe as the A B B A method of sequence, addressing the criticisms outlined above. They state that this method avoids “differences due to difficulty of the alternate forms A and B” and avoids “the presence of any marked practice effect in passing from one trial to the second” (Tinker & Paterson, 1928, p. 362).
Why did they stop using this experimental design? They justify abandoning the A B B A sequence method to permit “a simpler and more straightforward comparison” (Paterson & Tinker, 1929, p. 125). It’s possible that the A B B A configuration may have introduced some practical difficulties when administering the test in a classroom, or the cost of producing the test material may have been prohibitive at that time. But in retrospect, their approach seems somewhat misguided.
Conclusion
This critique of Tinker and Paterson’s research addresses a very specific issue from the perspective of current experimental psychology, which now has more sophisticated statistical methods than those available at the time of the studies. Nevertheless, their method was criticised by their contemporaries. More recent criticism (Berkson & Enneson, 2013), though questioning Tinker’s methodological principles and results, does not refer to the lack of counterbalancing.
From a designer’s perspective, there may be more general and relevant criticisms that could be levelled at these speed of reading tests, such as:
- the paragraphs are too short
- reading is interrupted by crossing out words
- measuring reading ease or comfort would be better that measuring reading speed
But ultimately, when the design of the study questions the validity of the results, other objections are less relevant.
Some implications for designers and researchers
Tinker and Paterson’s findings on line length are consistent with typographer’s recommendations for good printing practice (e.g. Bringhurst, 2019). These recommendations may be based on the research (e.g. Spencer, 1968; Schriver, 1997) or the “inherited experience of five hundred years of printing history” (McLean, 1980, p. 47). This agreement suggests that designers should continue observing guidance provided by practitioners unless new research contradicts our current conventions. As I mentioned in my previous article, there have been few studies into line length in print following Tinker and Paterson’s work. Now seems to be a good time for researchers to conduct new studies.
What did you think?
What did you think of the article? We would sincerely appreciate your feedback.
Send a commentReferences
Berkson, W., & Enneson, P. (2013). Readability: Discovery and disputation. Typography Papers, 9, 117–151. http://typography.network/wp-content/uploads/2023/08/Berkson_Enneson_TypPp_9_Readability_discovery_and_disputation.pdf
Bringhurst, R. (2019). The elements of typographic style (4th ed.). Hartley & Marks.
Cobb, E. K. (1944). The relation between certain phases of reading ability and speed and accuracy in typewriting [Master of Science dissertation, North Carolina University]. NC Digital Online Collection of Knowledge and Scholarship. https://libres.uncg.edu/ir/uncg/f/cobb_emma_1944.pdf
McLean, R. (1980). The Thames and Hudson manual of typography. Thames and Hudson.
Parker, A. J. (2019). The return-sweep in reading [Doctoral thesis, Bournemouth University]. BURO. http://eprints.bournemouth.ac.uk/32170/
Parker, A. J., Nikolova, M., Slattery, T. J., Liversedge, S. P., & Kirkby, J. A. (2019). Binocular coordination and return-sweep saccades among skilled adult readers. Journal of Vision, 19(6), 10. https://doi.org/10.1167/19.6.10
Paterson, D. G., & Tinker, M. A. (1929). Studies of typographical factors influencing speed of reading. II. Size of type. Journal of Applied Psychology, 13(2), 120–130. https://doi.org/10.1037/h0074167
Paterson, D. G., & Tinker, M. A. (1932). Studies of typographical factors influencing speed of reading: X. Style of type face. Journal of Applied Psychology, 16(6), 605–613. https://doi.org/10.1037/h0070644
Paterson, D. G., & Tinker, M. A. (1940). How to make type readable. Harper and Row.
Schriver, K. A. (1997). Dynamics in document design: Creating text for readers. Wiley.
Spencer, H. (1968). The visible word. Royal College of Art.
Tinker, M. A. (1963). Legibility of print. Iowa State University Press.
Tinker, M. A. (1965). Bases for effective reading. Lund Press.
Tinker, M. A., & Paterson, D. G. (1928). Influence of type form on speed of reading. Journal of Applied Psychology, 12, 359–368. https://doi.org/10.1037/h0073699
Tinker, M. A., & Paterson, D. G. (1929). Studies of typographical factors influencing speed of reading: III. Length of line. Journal of Applied Psychology, 13, 205–219. https://doi.org/10.1037/h0073597
Tinker, M. A., & Paterson, D. G. (1936). Studies of typographical factors influencing speed of reading. XIII. Methodological considerations. Journal of Applied Psychology, 20(1), 132–145. https://doi.org/10.1037/h0054333