Article
Mary Dyson

Examining the research methods used by legibility legends Tinker and Paterson

In a previous article I described revisiting research on line length, having come across some interesting findings which didn’t fit with a commonly held view. It seems that long line lengths may not be as problematic as we thought. Going back to research by Tinker and Paterson, questions were raised over the validity of their methods. This article takes a closer and critical look at what they did as their work is frequently cited in educational design literature and by other legibility researchers.

Motivation

Why am I so concerned about the studies conducted by Tinker and Paterson? Because until I read about a flaw in their method (Parker, 2019; Parker et al., 2019), I had considered their work to be a solid foundation for subsequent legibility research. Tinker is described as “an internationally recognized authority on legibility of print” (Tinker, 1963, p. iv). Having been alerted to this potential problem, I sought out the original articles, realising that I had previously been over-reliant on the summaries (e.g. Tinker, 1963; 1965). I also looked to see if other researchers and writers had critiqued their method.

Test material

Essentially the same experimental design was used throughout all Tinker and Paterson’s studies measuring speed of reading. A standardized test provided the reading material: the Chapman-Cook speed of reading test. The test involves reading paragraphs of two sentences and underlining the word which spoils the meaning. The number of paragraphs read in 1 minute and 45 seconds is measured. Sets of paragraphs are printed as individual forms and each form has different content, i.e. a unique set of paragraphs. See Figure 1 for an example of one form (Form B).

Tinker and Paterson (1929): an example of their experimental method

Their first published study into line length reviewed the existing results from experiments and the opinions of advertisers to conclude that “the problem of optimal line length is in a very unsatisfactory state” (Tinker & Paterson, 1929, pp. 209–210). In this study, the paragraphs were printed in 10pt Scotch Roman, varying the line length from 59 mm to 186 mm, set solid. They compared line lengths by using a standard: a line length of 80 mm. See Figure 2 for a simulation of part of the two forms. Each pair (standard and test line length) was tested on a different group of 80 university students adding up to a total of 560 participants, across 7 comparisons (see Table 1).

Form B of Chapman-Cook Speed of Reading. The image is taken from Cobb (<a href="#ref:cobb-1944">1944, p. 37</a>). This example has a time limit of two and a half minutes.

Figure 1: Form B of Chapman-Cook Speed of Reading. The image is taken from Cobb (1944, p. 37). This example has a time limit of two and a half minutes.

They stated that in all 7 groups, the line length of 80 mm was read faster than the alternative, only one of which was shorter (59 mm); the rest were longer than 80 mm. All differences except one (80 mm versus 97 mm) were reported as statistically significant.

  Test form Standard line length Test form Test line length
Group 1 Form A 80 mm Form B 59 mm
Group 2 Form A 80 mm Form B 97 mm
Group 3 Form A 80 mm Form B 114 mm
Group 4 Form A 80 mm Form B 136 mm
Group 5 Form A 80 mm Form B 152 mm
Group 6 Form A 80 mm Form B 168 mm
Group 7 Form A 80 mm Form B 186 mm
Table 1: Line lengths given to 7 groups of students. Each group first receives the standard line length of 80 mm and then the test line length. (Tinker and Paterson, 1929, p. 210)

Simulation of two of the paragraphs from Form A (80 mm) and two from Form B (7 different test line lengths).

Figure 2: Simulation of two of the paragraphs from Form A (80 mm) and two from Form B (7 different test line lengths).

The flaw

As mentioned above, each form of the Chapman-Cook test (Form A and Form B) has different content, i.e. a different set of paragraphs. As Table 1 shows, in each group Form A is used for the standard (line length of 80 mm) and Form B for the alternative, the test line length. Material set in each pair of line lengths was read by a different group of participants who always read Form B after Form A and the test line length always followed the standard. This means that the content of each form is confounded with line length. Faster reading of the 80 mm line length could be attributed to Form A having paragraphs that are easier to read than those of Form B. Moreover, the effect of practice is not controlled.

Their control condition

Recognising this potential problem, Tinker and Paterson’s solution was to include what they considered to be a control condition where one group receives Forms A and B set in the same line length (80 mm). Any difference due to practice or lack of equivalence of content is then used as a “correction factor” (Paterson & Tinker 1940, p. 42) that is applied to correct for potential effects of the order and content of the forms. (See Table 2 for an example of how the correction is applied). This correction was considered adequate to conclude that a difference in reading speed could be attributed to the typographical arrangement, rather than the test procedure.

Group (1) Test form (2) Typeface (3) Average number
of paragraphs read (4)
Difference between
A and B (5)
Corrected difference
between A and B (6)
1 A Scotch Roman 19.1 0.71 0
  B Scotch Roman 18.39    
2 A Scotch Roman 18.78 0.64 -0.07
  B Garamont 18.14    
3 A Scotch Roman 18.81 0.74 0.03
  B Antique 18.07    
4 A Scotch Roman 19.42 0.91 0.2
  B Bodoni 18.51    
5 A Scotch Roman 19.03 0.91 0.2
  B Old Style 18.12    
6 A Scotch Roman 18.43 0.94 0.23
  B Caslon O.S. 17.49    
7 A Scotch Roman 19.06 1.13 0.42
  B Kabel Lite 17.93    
Table 2: Group 1 works as the control group and the difference between the number of paragraphs read in Form A and Form B of Group 1 is 0.71 (column 5). This number functions as the correction factor and is subtracted from each of the differences found between other comparisons, resulting in the numbers in column 6. Note that the size of the difference between Form A and Form B is therefore reduced. The data in this table is a subset of the data in Table 1 from Paterson and Tinker (1932, pp. 610–611).

No control group was used in the 1929 line length study which explains why there are more significant differences described in the 1929 paper than summarised in the subsequent book (Paterson & Tinker, 1940, p. 42) where the correction factor is applied (reducing the size of the differences). In the 1940 publication, variations in line length between 80 mm and 152 mm (i.e. 80, 97, 114, 136, 152 mm) are said to have little or no effect on speed of reading. An unusually short line (59 mm) may slow reading. At line lengths of 168 and 186 mm “the evidence is clear that reading speed is significantly retarded” (Paterson & Tinker, 1940, p. 43). Here is a statement that long lines slow down reading. See Table 3 for both the uncorrected and corrected data.

This control group technique is questionable because the correction factor is obtained from one group of readers and then applied to different groups of readers. Even if the experiment conditions are the same for all groups, the variability among readers may mean that the correction factor is not appropriate for them. It is perfectly valid to compare different groups of readers, known as between subject comparisons, but the data do not seem to be analysed in this manner.

Their response to criticism

Tinker and Paterson were aware of their “methodological difficulties” and published a paper on methodological considerations (Tinker & Paterson, 1936). This paper describes a series of “special experiments” they conducted to address all the issues they could think of. These include discussion of the control condition and differences among participant groups. They explore the differences and decide they are within suitable limits (Tinker & Paterson, 1936, p. 135). This is questionable (see note 5). Subsequently, Paterson and Tinker (1940, p. 188) acknowledge that “Some critics might believe that these differences would affect the typographical comparisons involved in any one study”. I believe that this is a possibility; I am unconvinced by the data they use to support their argument that the variations in the average scores of the 7 groups do not affect their results. But we do not know how the results might be affected.

Group (1) Form (2) Line length (3) Average number
of paragraphs read (4)
Difference between
A and B (5)
Corrected difference
between A and B (6)
Percent difference
after correction (7)
1 A 80 mm 18.31 1.25 0.75 -4.1
  B 59 mm 17.06      
2 A 80 mm 18.46 0.5 0 0
  B 97 mm 17.96      
3 A 80 mm 18.19 0.96 0.46 -2.5
  B 114 mm 17.23      
4 A 80 mm 18.98 0.93 0.43 -2.3
  B 136 mm 18.05      
5 A 80 mm 18.94 1.14 0.64 -3.4
  B 152 mm 17.8      
6 A 80 mm 18.88 1.47 0.97 -5.1
  B 168 mm 17.41      
7 A 80 mm 18.31 1.88 1.38 -7.5
  B 186 mm 16.43      
Table 3: The numbers in columns 4 and 5 are taken from Table 1 of Tinker and Paterson (1929, p. 211). The numbers in column 7 are taken from Table 14 of Paterson and Tinker, (1940, p. 42). The negative difference in this final column indicates that the test line length is read slower than the standard (80 mm). I have added column 6 to indicate the effect of the correction factor (0.5) on the difference in the number of paragraphs read. Note that the correction factor comes from a comparison of Form A in 80 mm and Form B in 97 mm, and not both forms in 80 mm.

A better solution to the flaw in their method

Tinker and Paterson spent considerable effort justifying their methods rather than addressing the problem by counterbalancing the conditions across different groups of participants. Counterbalancing avoids confounding the content of the forms with line length and balances out effects of practice. As a consequence, the results will have greater validity because they are measuring the effects of line length and not other unintended factors.

To counterbalance the conditions:

Instead of using their single configuration:

Group 1st form Line length 2nd form Line length
Group 1 Form A Standard line length Form B Test line length

three other configurations are added to create a balanced design, with different groups of participants assigned to each configuration:

Group 1st form Line length 2nd form Line length
Group 1a Form A Standard line length Form B Test line length
Group 1b Form B Standard line length Form A Test line length
Group 1c Form A Test line length Form B Standard line length
Group 1d Form B Test line length Form A Standard line length

As I tried to work out possible reasons why Tinker and Paterson hadn’t used this experimental design, I was surprised to find that they had. In an earlier paper (Tinker & Paterson, 1928) they explain why they use this design, which they describe as the A B B A method of sequence, addressing the criticisms outlined above. They state that this method avoids “differences due to difficulty of the alternate forms A and B” and avoids “the presence of any marked practice effect in passing from one trial to the second” (Tinker & Paterson, 1928, p. 362).

Why did they stop using this experimental design? They justify abandoning the A B B A sequence method to permit “a simpler and more straightforward comparison” (Paterson & Tinker, 1929, p. 125). It’s possible that the A B B A configuration may have introduced some practical difficulties when administering the test in a classroom, or the cost of producing the test material may have been prohibitive at that time. But in retrospect, their approach seems somewhat misguided.

Conclusion

This critique of Tinker and Paterson’s research addresses a very specific issue from the perspective of current experimental psychology, which now has more sophisticated statistical methods than those available at the time of the studies. Nevertheless, their method was criticised by their contemporaries. More recent criticism (Berkson & Enneson, 2013), though questioning Tinker’s methodological principles and results, does not refer to the lack of counterbalancing.

From a designer’s perspective, there may be more general and relevant criticisms that could be levelled at these speed of reading tests, such as:

But ultimately, when the design of the study questions the validity of the results, other objections are less relevant.

Some implications for designers and researchers

Tinker and Paterson’s findings on line length are consistent with typographer’s recommendations for good printing practice (e.g. Bringhurst, 2019). These recommendations may be based on the research (e.g. Spencer, 1968; Schriver, 1997) or the “inherited experience of five hundred years of printing history” (McLean, 1980, p. 47). This agreement suggests that designers should continue observing guidance provided by practitioners unless new research contradicts our current conventions. As I mentioned in my previous article, there have been few studies into line length in print following Tinker and Paterson’s work. Now seems to be a good time for researchers to conduct new studies.

What did you think?

What did you think of the article? We would sincerely appreciate your feedback.

Send a comment

Enjoyed the article?

Sign up for our newsletter and get notified when we publish the next one.

References

Berkson, W., & Enneson, P. (2013). Readability: Discovery and disputation. Typography Papers, 9, 117–151. http://typography.network/wp-content/uploads/2023/08/Berkson_Enneson_TypPp_9_Readability_discovery_and_disputation.pdf

Bringhurst, R. (2019). The elements of typographic style (4th ed.). Hartley & Marks.

Cobb, E. K. (1944). The relation between certain phases of reading ability and speed and accuracy in typewriting [Master of Science dissertation, North Carolina University]. NC Digital Online Collection of Knowledge and Scholarship. https://libres.uncg.edu/ir/uncg/f/cobb_emma_1944.pdf

McLean, R. (1980). The Thames and Hudson manual of typography. Thames and Hudson.

Parker, A. J. (2019). The return-sweep in reading [Doctoral thesis, Bournemouth University]. BURO. http://eprints.bournemouth.ac.uk/32170/

Parker, A. J., Nikolova, M., Slattery, T. J., Liversedge, S. P., & Kirkby, J. A. (2019). Binocular coordination and return-sweep saccades among skilled adult readers. Journal of Vision, 19(6), 10. https://doi.org/10.1167/19.6.10

Paterson, D. G., & Tinker, M. A. (1929). Studies of typographical factors influencing speed of reading. II. Size of type. Journal of Applied Psychology, 13(2), 120–130. https://doi.org/10.1037/h0074167

Paterson, D. G., & Tinker, M. A. (1932). Studies of typographical factors influencing speed of reading: X. Style of type face. Journal of Applied Psychology, 16(6), 605–613. https://doi.org/10.1037/h0070644

Paterson, D. G., & Tinker, M. A. (1940). How to make type readable. Harper and Row.

Schriver, K. A. (1997). Dynamics in document design: Creating text for readers. Wiley.

Spencer, H. (1968). The visible word. Royal College of Art.

Tinker, M. A. (1963). Legibility of print. Iowa State University Press.

Tinker, M. A. (1965). Bases for effective reading. Lund Press.

Tinker, M. A., & Paterson, D. G. (1928). Influence of type form on speed of reading. Journal of Applied Psychology, 12, 359–368. https://doi.org/10.1037/h0073699

Tinker, M. A., & Paterson, D. G. (1929). Studies of typographical factors influencing speed of reading: III. Length of line. Journal of Applied Psychology, 13, 205–219. https://doi.org/10.1037/h0073597

Tinker, M. A., & Paterson, D. G. (1936). Studies of typographical factors influencing speed of reading. XIII. Methodological considerations. Journal of Applied Psychology, 20(1), 132–145. https://doi.org/10.1037/h0054333

More reading

See all articles