Mary Dyson, David Březina

The sequel to exploring disfluency: Do we remember the visual appearance of words?

We described our study exploring disfluency in a previous article and conference presentation at the ICTVC conference in Patras in June 2019. In that study we explored whether designers’ greater sensitivity to typographic presentation might influence their judgments of memory for items set in a less legible font (Sans Forgetica) and how well they identify and remember words. We did find that designers’ judgments of memory were influenced more by the legibility of items compared with non-designers. However, there was no difference in the accuracy of responses in either group. Sans Forgetica items were not easier to remember, as predicted by disfluency theory.


Our second study further explores designers’ sensitivity to visual aspects of text. We investigate whether the representation of an item in memory includes the font styling and if this differs between designers and non-designers. Does changing the font between initial reading of an item and its subsequent presentation impair recognition of the item? We conclude that there is some evidence that the font styling is retained in memory when reading, but this requires further research.

Why did we do this research?

Our first study (referred to as study 1 below) had two parts which consisted of two consecutive tasks: lexical decision and recognition (see Figures 1 and 2. For detailed description see the section ‘What we did’ in the report of study 1. Each part of study 1 used one font, either Arial or Sans Forgetica, for both tasks. We realised this study design could also be used to explore what happens if the font changes between the two tasks, if the lexical task is in one font and the subsequent recognition task is in another (see Figures 1 and 2). Is recognition of a word or non-word more difficult if the visual appearance of the item does not match how it was previously seen? Might designers be more likely to remember the visual representation, the font used to style the text, than non-designers?

Example screen of the lexical decision task using Arial (Task 1)

Figure 1: Example screen of the lexical decision task using Arial (Task 1).

Example screen of the recognition task using Sans Forgetica (Task 2)

Figure 2: Example screen of the recognition task using Sans Forgetica (Task 2).

Existing academic research on memory representations

A distinction has been made between graphemic and semantic analyses (Kolers, 1975, 1976). This distinction is also described as perceptual and conceptual, respectively. A representation in memory which preserves information about the graphemic or perceptual features of text (e.g. font styling) is contrasted with a representation which contains only conceptual information about the meaning of the text (Sheridan & Reingold, 2012).

A technique which is typically used to explore whether a previous item can influence the response to a subsequent item is called priming. If the first item activates a particular representation in memory, and there is a partial match with the subsequent item, the response can be facilitated, i.e. responses are faster. With respect to the representations mentioned above, priming may be perceptual or conceptual. The degree of facilitation depends on the match between the two presentations (Roediger & Blaxton, 1987). Some researchers have found reduced perceptual priming with font style changes, because of the mismatch between the visual representations of corresponding items (Jacoby & Hayman, 1987; Roediger & Blaxton, 1987) but others have found no difference (Rajaram & Roediger, 1993). There is therefore a question as to whether font styling is represented in memory in some way.

Existing academic research on differences due to training

We are not aware of any academic research which has specifically explored the effect of design expertise on memory for font styling. But anecdotally, since designers are trained to discriminate among font styles, we might predict that they would be more likely to attend to the perceptual features and this might affect the representation in memory.

Our research questions:

What we did

We described the format of these studies in our previous article ‘Exploring disfluency: Are designers too sensitive to harder-to-read typefaces?’. The only difference in format between the current study and study 1 is that here we changed the font between the lexical decision and recognition tasks (see Figure 3). Our analysis includes a comparison with the results of study 1 in order to address the research questions.

We have made the study website and data available on GitHub and licenced them under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).

A schematic comparison between Study 1 and the current Study 2
Figure 3: A schematic comparison between study 1 and the current study 2. The general format of the study did not change. In study 2 we changed the font between the lexical decision (shown in green) and recognition (blue) tasks. Note that in each study there are two versions: one starting with samples in Arial and one starting with Sans Forgetica. Half the participants received the version starting with Arial, and half the version starting with Sans Forgetica.

Our participants

We had 122 participants in this study. We grouped the various categories of designers together to form two groups with 63 designers and 59 non-designers.

What we found in study 2


Many results were the same as study 1. Again, participants judged Sans Forgetica as less legible than Arial, and thought they would be better at remembering items in Arial than in Sans Forgetica. Items in Sans Forgetica were responded to more slowly than items in Arial for both tasks. There were no differences in accuracy of responses to Arial and Sans Forgetica in either task.

Changing font

When compared with the results of study 1, participants were worse at remembering items when the font changed between lexical decision and recognition. However, participants also got fewer word or non-word decisions correct in this second study, compared to study 1. As the lexical decision task is a stand-alone task, this should not be affected by the change of font in the recognition items.

Designers and non-designers

In this study, differences between judgments of memory for the two fonts did not differ quite as much between designers and non-designers, but the pattern was the same. The difference between judgments of memory for Arial and Sans Forgetica in designers was slightly larger than in non-designers.

As in study 1, designers responded more quickly than non-designers in the recognition task. But, unlike study 1, there was no difference between designers and non-designers in speed of responding in the lexical decision task. A new finding was that designers were less accurate in their lexical decision and recognition responses than non-designers. The designers in this study were also less accurate in their responses to the lexical decision task than the designers in the first study.

Words vs non-words

As in study 1, responses to words were faster than non-words for both tasks. Also, in both tasks, the responses to non-words were slowed down more compared to words when they were set in Sans Forgetica. In study 1, both groups of participants remembered non-words better than words. However, in this study only non-designers were better at remembering non-words, remembering more non-words than designers (see Figure 4).

Figure 4: Non-designers are more accurate at remembering non-words. The accuracy of responses is measured using AUC which is free of response bias, such as responding ‘seen’ most of the time.

What do these results tell us?

We have replicated some of our results from study 1 regarding Sans Forgetica. Items set in this font in the lexical decision task are considered less legible and less memorable than items set in Arial, and slow down reading. But accuracy of responses in both tasks is not affected by the font; there is no disfluency effect.

Changing the font appears to make the second study more challenging than study 1. As items were less well remembered in this study, the visual appearance (font styling) of items may be represented in memory, along with meaning. In study 1, recognition may have been easier because of the perceptual match between items in the two tasks. However, we would not expect the lexical decision task to be more difficult, as this should not be affected by the change in font in the subsequent recognition items. The reduction in accuracy of lexical decisions may reflect an unsettling effect of changing fonts, which might be experienced when reaching the second part of the study. Recognition was poorer in the second part of the study, perhaps due to interference from items seen in the first part. This result was not found in study 1 and may again reflect the greater difficulty of this second study.

In the lexical decision task of study 1, participants were slower to respond to non-words in Sans Forgetica, compared with Arial. In the second study, this applied to the recognition task as well. As discussed in the previous report (section ‘What do these results tell us?’), non-words in Sans Forgetica need more time to decipher than words because we cannot use our knowledge of words to fill in the letters that are harder to identify. Note that in this second study, the non-words were set in a different font in the lexical decision task and the recognition task. Seeing them first in Sans Forgetica and needing to recognise them set in Arial did not seem to have an effect, but seeing them first in Arial and needing to recognise them in Sans Forgetica slowed participants’ responses down. This asymmetry cannot be explained solely by the change in the visual representation from one font to another resulting in reduced perceptual priming. Letter identification seems to be further slowed down by the non-word being set in the less legible Sans Forgetica after it has been processed in the more legible Arial.

The differences between designers and non-designers in their speed of responding and accuracy of answers might indicate different approaches to the task. It is possible that the variability introduced by changing the font between tasks affected the two groups in different ways. The designers in this study contrast with the non-designers by appearing to prioritise speed over accuracy, particularly in the recognition task.

Between study 1 and this study, the small difference in the Judgment of Memory results for designers and non-designers stems from designers slightly decreasing their judgments of the difference between the two fonts and non-designers slightly increasing their judgments of the difference. In both studies, within each group of designers, there is quite a lot of variation in the size of the difference between their Judgment of Memory for Arial and Judgment of Memory for Sans Forgetica. Non-designers are less variable. Looking at the breakdown into categories of designers, the first study has a prevalence of letter designers, whereas the second study includes more graphic designers (see Figure 5). This indicates that letter designers may be even more sensitive to font differences than graphic designers, which makes sense.

Figure 5: Breakdown of number of participants by professional training across the two studies. Study 1 includes more letter designers than other categories of designer whereas study 2 includes more graphic designers.

Comment on the results of the two studies

Although items in Arial were judged consistently as more memorable and easier to read, these items were not better remembered. The nature of the lexical decision task, and the instruction to proceed as quickly and accurately as possible, may have deterred more effortful processing of Sans Forgetica. However, slower responses to items in Sans Forgetica suggest that greater effort was needed. The lack of a positive effect on memory from less legible fonts is not too surprising as there is only weak evidence that hard-to-read materials enhance learning.

We make no claims that we are looking at ‘normal reading’. Non-words were incorporated in the studies purely as a tool for exploring memory for words. The unexpected benefit of including non-words is that these highlighted the problems of using a less legible font. With non-words, readers must rely on letter identification without any contribution from word-level information, i.e. knowledge of the language. This confirms that designers should use legible fonts for texts where readers are less familiar with the vocabulary, e.g. low frequency (unusual) words, as less legible fonts will increase the difficulty of reading such words.

Reading research looking at typographic variables is typically concerned with ease and efficiency of reading and not measuring learning. Our studies tested memory after a short delay and was therefore not a test of long-term retention. However, our studies suggest that the best strategy when designing texts where the content needs to be retained is for designers to use legible fonts to minimize the effort readers need to put in. This would apply to educational materials and lengthy stories, for example.

There is some evidence that the font styling is retained in memory when reading, but this requires further research.

Some interesting and unexplained differences emerged between the designers and non-designers. In almost all cases, the divergence was not in response to the different fonts, which we might have expected, but a more general difference in strategy or approach to the tasks. It is feasible that this difference is a result of designers’ professional training.



Barton, J.J.S., Sekunova, A., Sheldon, C., Johnston, S., Iaria, G., & Scheel, M. (2010). Reading words, seeing style: The neuropsychology of word, font and handwriting perception. Neuropsychologia, 48(13), 3866–3877.

Dyson, M.C. (2020). Does perceptual disfluency theory represent a significant challenge to a legibility researcher? Hyphen, 12(18), 17–35.

Geller, J., Davis, S.D., & Peterson, D.J. (2020). Sans Forgetica is not desirable for learning. Memory, 28(8), 957–967.

Jacoby, L.L., & Hayman, C.A.G. (1987). Specific visual transfer in word identification. Journal of Experimental Psychology: Learning Memory and Cognition, 13(3), 456–463.

Kolers, P.A. (1975). Specificity of operations in sentence recognition. Cognitive Psychology, 7(3), 289–306.

Kolers, P.A. (1976). Reading a year later. Journal of Experimental Psychology-Human Learning and Memory, 2(5), 554–565.

Rajaram, S., & Roediger, H.L. (1993). Direct comparison of four implicit memory tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(4), 765–776.

Roediger, H.L., & Blaxton, T.A. (1987). Effects of varying modality, surface features, and retention interval on priming in word-fragment completion. Memory & Cognition, 15(5), 379–388.

Sheridan, H., & Reingold, E.M. (2012). Perceptual specificity effects in rereading: Evidence from eye movements. Journal of Memory and Language, 67(2), 255–269.

Taylor, A., Sanson, M., Burnell, R., Wade, K.A., & Garry, M. (2020). Disfluent difficulties are not desirable difficulties: The (lack of) effect of Sans Forgetica on memory. Memory, 28(7), 850–857.

More reading

See all articles