Return to articles

Draft of: Daneman, M., Reingold, E. M., & Davidson, M. (1995). Time course of phonological activation during reading: Evidence from eye fixations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 884-898.

The Time Course of Phonological Activation During Reading: Evidence from Eye Fixations

Meredyth Daneman, Eyal M. Reingold, and Monica Davidson



Abstract

Our eye fixation data support a theory of lexical access in which phonological sources of activation and influence are delayed relative to orthographic sources. Unlike proofreading data which show that readers are less likely to detect homophonic errors (e.g. He was in his silk stocking feat) than nonhomophonic errors (e.g., He was in his silk stocking fate), the eye fixations revealed that readers initially experienced as much difficulty encountering a homophonic error as a nonhomophonic one, a finding which suggests that they use orthographic codes rather than phonological codes to activate word meanings. However, homophony facilitated the recovery process, a finding which suggests that phonology has its influence after lexical access. Experiment 1 showed that the findings were consistent whether the error was the lower frequency homophone (stocking feat) or the higher frequency homophone (feet of courage). Experiment 2 showed that proofreading responses can be unreliable indices of error detection because even when readers fail to make an overt error detection response, their eye fixations reveal that they have detected the error.



This study shows how eye fixation data can be used to reveal the time course of phonological activation during natural silent reading. The eye fixation data from two experiments support a theory of lexical access in which phonological sources of activation and influence are delayed relative to orthographic sources (see also Daneman & Reingold, 1993; McCutchen & Perfetti, 1982), rather than a theory in which phonological codes play an early and/or dominant role (e.g. Daneman & Stainton, 1991; Inhoff & Topolski, in press; Pollatsek, Lesch, Morris, & Rayner, 1992; Perfetti, Bell, & Delaney, 1988; Rayner, Sereno, Lesch, & Pollatsek, in press; Van Orden, 1987; Van Orden, Pennington, & Stone, 1990). The study also demonstrates the importance of providing evidence for phonological processes from a direct reading measure (e.g., Daneman & Reingold, 1993) rather than relying on evidence from a secondary measure such as proofreading (e.g., Daneman & Stainton, 1991). By collecting eye fixation data in conjunction with a proofreading task, the study reveals that an on-line reading measure is the more reliable and sensitive index of the time course of phonological activation during reading than is the secondary proofreading measure.

Much of the evidence for the early involvement of phonology comes from tasks showing homophone confusion effects.1 One such task is lexical decision in which subjects judge whether a given letter string is a word. A typical finding is that subjects take more time to reject pseudohomophone foils, such as brane, than control foils, such as brene (Coltheart, Davelaar, Jonasson, & Besner, 1977; Rubenstein, Lewis, & Rubenstein, 1971). A common explanation for the effect is that the pseudohomophone brane activates the phonological representation /breIn/, which in turn activates the lexical entry for the word brain. The activation of a lexical entry makes brane more difficult to classify as a nonword. Another task showing homophone confusions to single words is Van Orden's (1987) semantic categorization task. In this task, subjects are presented a category name (e.g., type of flower) followed by a target word (e.g., rows, robs, tulip), and their task is to decide whether or not the target word is a member of the category. The typical finding is that subjects make more false categorization responses to rows which sounds like the genuine category member, rose, than they do to robs which is orthographically similar to rose but does not sound like it (Van Orden, 1987; Van Orden, Johnston, & Hale, 1988). Homophone confusions have also been demonstrated in tasks that require readers to make semantic decisions about an entire sentence rather than an isolated word. In this paradigm, readers are required to judge the acceptability of short sentences, with some of the incorrect sentences containing a homophonic word that makes the sentence sound correct (e.g. She has blond hare.) The typical finding is that readers make more false-positive decisions to She has blond hare. which sounds like the correct sentence, She has blond hair., than they do to She has blond harm. which is orthographically similar to She has blond hair. but does not sound like it (Coltheart, Avons, & Trollope, 1990; Coltheart, Laxon, Rickard, & Elton, 1988; Doctor & Coltheart, 1980; Johnston, Rugg, & Scott, 1987; Treiman, Freyd, & Baron, 1983). The finding that homophonic words are commonly misinterpreted as their sound-alike mates has been taken as evidence for phonology playing an early and dominant role in accessing word meanings.

Homophonic confusions do not only occur in tasks that involve decisions to lists of unrelated words or simple sentences; they have been demonstrated for realistic, everyday prose as well (Daneman & Stainton, 1991). Daneman and Stainton (1991) had subjects proofread a lengthy and complex prose passage containing inconsistent words that were or were not homophones of consistent ones. Phonology was implicated by the finding that readers were less likely to detect homophonic errors (e.g., Alone at his teller's cage, idle and board...) than nonhomophonic ones (e.g., Alone at his teller's cage, idle and beard...) if they had been familiarized with an error-free version of the text beforehand. Daneman and Stainton (1991) inferred that the locus of this homophone interference was at lexical access rather than later in a sound-based working memory because a concurrent speaking manipulation, which is supposed to interfere with the maintenance of phonological codes in working memory (Baddeley, 1979; Baddeley, Eldridge, & Lewis, 1981; Besner, 1987), did not abolish or even reduce the homophone interference effect.

A recent study by Daneman and Reingold (1993) has challenged the Daneman and Stainton (1991) conclusion that phonological codes play an early, dominant role in accessing word meanings. Although Daneman and Stainton's (1991) proofreading task had greater ecological validity than previous research on homophone confusions, Daneman and Reingold (1993) were concerned that their proofreading requirement changed the nature of the reading process itself. Consequently, Daneman and Reingold (1993) looked for evidence of phonological processes in a task that demanded nothing other than normal reading for comprehension. Subjects read the same 1100-word text used in the Daneman and Stainton (1991) study. However, they were not told that inconsistent words had been introduced into the text, nor were they given explicit instructions to proofread for the inconsistent words as they read; they were simply asked to read for comprehension and their eye fixations were recorded to examine whether phonological involvement is spontaneously revealed in the moment-to-moment computational processes of regular reading. Previous research has shown that readers pause longer on words that are inconsistent with previously read information (Carpenter & Daneman, 1981; Frazier & Rayner, 1982; Just & Carpenter, 1980) and frequently make regressive fixations as they attempt to resolve the inconsistencies (Carpenter & Daneman, 1981). Thus, any additional time spent fixating an inconsistent phrase (e.g., ...idle and board... or ...idle and beard...) relative to the consistent one (e.g., ...idle and bored) could be attributed to the processes involved in inconsistency detection and recovery. Contrary to Daneman and Stainton's (1991) results from their secondary proofreading task, Daneman and Reingold (1993) found no evidence that homophony interfered with the initial detection of homophone errors, whether or not subjects were familiarized on an error-free version of the text first. In fact, homophone errors were as disruptive as nonhomophone errors when first encountered, suggesting that they were detected as easily. This lack of phonological interference in the early detection of homophonic errors was taken as evidence against those models that assume phonological sources of activation invariably mediate lexical access (Daneman & Stainton, 1991; Van Orden, 1987). Instead the results suggested that readers bypass phonology, using the orthographic representations for board and beard as a direct route to their contextually inconsistent meanings, "plank" and "facial hair," respectively.

Whereas Daneman and Reingold's (1993) initial detection data did not provide evidence for the early engagement of phonological processes in activating a word's meaning, the post-detection data provided some evidence for the delayed involvement of phonology in the error recovery processes. Consistent with Daneman and Stainton's (1991) findings from the problem-solving or repair version of their proofreading task (see Daneman & Stainton, Experiment 4), Daneman and Reingold (1993) found that homophony facilitated the error recovery processes that are initiated after an inconsistency is detected. The regressive eye fixations showed that readers spent less time reading and rereading phrases containing homophonic errors (e.g., ...idle and board...) relative to phrases containing the orthographically-matched nonhomophonic errors (e.g. ...idle and beard...), presumably because for those cases in which readers had successfully detected the inconsistent impostor word (board), they could exploit the shared phonology (e.g., /b?rd/) as a route to recovering the correct alternative (bored). These data suggested that phonology has its influence after lexical access. Whether these phonological codes are activated automatically as a by-product of lexical access (McCutchen & Perfetti, 1982), or whether they are deliberately and consciously computed as part of a reader's repertoire of error-recovery heuristics (Carpenter & Daneman, 1981), the data could not reveal. However, the Daneman and Reingold (1993) data did suggest that the ability to exploit these phonological codes appears to be somewhat limited because homophony was not sufficient to facilitate error recovery; orthographic similarity was necessary too. The homophone facilitation effect found for repairing similarly spelled same-length homophone pairs (e.g. bored/board; hair/hare) did not generalize to the less similarly spelled different-length homophone pairs (e.g. weighed/wade; threw/through). Thus even when it comes to post-detection error recovery processes, readers appear to pay considerable attention to orthographic sources of information as they attempt to interpret and reinterpret inconsistent words.

The present study reports two experiments aimed at exploring the generality of the Daneman and Reingold (1993) findings, as well as corroborating the claim that an on-line reading measure exposes the time course of phonological activation during normal reading in a way that a secondary proofreading measure is not sensitive enough to do. In Experiment 1, we show that the Daneman and Reingold (1993) findings generalize across texts that differ in whether the error is the lower frequency homophone (as in McCrae wore his hare a little too long now that he was going grey...) or the higher frequency homophone (as in A few muskrats and a hair built a house in the marshy field). In Experiment 2, we add a proofreading component to the eye fixation paradigm, and reveal the problems associated with making inferences about phonological processes from a subsidiary proofreading response.

Experiment 1

Daneman and Reingold (1993) provided evidence for the time course of phonological activation during the most natural or typical of silent reading situations possible--reading for comprehension and enjoyment. Whereas data based on more indirect tasks (e.g. Daneman & Stainton, 1991; Van Orden, 1987) have supported a theory of lexical access in which phonological codes play an early and dominant role, Daneman and Reingold's eye fixation data called for a much more restricted and delayed involvement of phonology by showing that homophonic errors were initially as disruptive as nonhomophonic ones, with homophony only making a difference in the post-lexical access error recovery processes for similarly spelled same-length homophones (e.g. hair/hare; feet/feat). In Experiment 1, we attempted to replicate the Daneman and Reingold (1993) findings for same-length homophone pairs and show that they generalize to new stimulus texts and words of different frequencies.

There are two reasons for questioning the generalizability of the Daneman and Reingold (1993) finding that homophonic errors were initially as disruptive as nonhomophonic errors. The first has to do with the number or density of error words in Daneman and Reingold's (1993) experimental text. Although the Russell Wood text (see also Daneman & Stainton, 1991) was a fairly lengthy prose passage (1100 words), the experimental manipulation involved the introduction of 24 error words in Experiment 1 and 32 error words in Experiment 2. One could argue that the relatively large number of error words might have drawn attention to the experimental manipulation and contributed to the easy detection of the homophonic errors. To increase the likelihood that the paradigm was indeed capturing the kinds of spontaneous inconsistency detection and recovery processes that are part and parcel of natural reading and comprehension monitoring, we used two new texts in the present study. These were published short stories by well-known Canadian authors which we will refer to as The Black Queen and The Desjardins, respectively. Both texts were roughly equivalent in length to the 1100 word Russell Wood text (The Black Queen was 1150 words in length; The Desjardins was 1350 words in length), but they had fewer error words introduced into them: 15 in Experiment 1a and 20 in Experiment 1b. Thus, not only could we investigate whether the Daneman and Reingold (1993) findings generalize to new stimulus passages, but we could determine whether homophonic errors are still as disruptive as nonhomophonic errors when they occur with less regularity in the text.

Word frequency is a second reason why some researchers might question the generalizability of the Daneman and Reingold (1993) finding that phonology does not appear to be involved in the initial activation of word meanings. Some researchers have argued that phonology is more likely to be implicated in the processing of low-frequency words (Jared & Seidenberg, 1991; see also Patterson & Coltheart, 1987, for a review). For example, Jared and Seidenberg (1991) showed that the typical homophone interference effect in Van Orden's (1987) semantic categorization task (e.g. miscategorizing rows as "a flower") is largely attributable to cases in which the homophone foil and its mate are both low-frequency words. If phonology is more likely to be implicated in the processing of the slower-to-process low-frequency words (Coltheart, 1978; Katz & Frost, 1992; McCusker, Hillinger, & Bias, 1981; Patterson & Coltheart, 1987; Seidenberg, 1985), then one might expect phonology to interfere with the initial detection of only the low-frequency homophone error words in Daneman and Reingold's (1993) natural reading task. And one might be tempted to conclude that Daneman and Reingold's (1993) failure to find a homophone interference effect in the initial detection of homophone errors could be attributed to the fact that many (if not all) of their homophone errors were high-frequency words. We doubt that high word frequency was responsible for the lack of a homophone interference effect in Daneman and Reingold's initial detection data. Although word frequency was not systematically manipulated in that study, a wide range of word-frequency counts (Kucera & Francis, 1967) was represented in the pool of 48 target words,2 and item analyses showed that the easy initial detection of homophonic errors was highly significant across the entire pool of items. Nevertheless, findings from the semantic categorization task (Jared & Seidenberg, 1991) point out the need for systematically manipulating the frequency of the homophone error word as well as the frequency of the text-word it replaces. The word frequency manipulation in the present study involved (1) 30 homophone pairs in which the first member of the pair had a considerably higher frequency of occurrence than its mate (e.g.hair/hare; feet/feat; real/reel), and (2) two texts: The Black Queen and The Desjardins. The 30 higher frequency homophones (e.g., hair, feet, real) all appeared in the Black Queen text; the 30 lower frequency homophones (e.g., hare, feat, reel) all appeared in the Desjardins text. This meant that when homophone errors were introduced into the text containing the higher-frequency homophones (Black Queen), they were always the lower-frequency word of the pair (e.g. McCrae wore his hare a little too long...); when homophone errors were introduced into the text containing the lower-frequency homophones Desjardins), they were always the higher-frequency word (e.g., A few muskrats and a hair built a house in the marshy field). If phonology is more likely to be recruited in the activation of meanings for lower-frequency words (e.g., Seidenberg, 1985), then subjects who read the Black Queen text (the one with the higher-frequency text-words and lower-frequency errors) should find it less easy to detect the "sound-okay" errors than subjects who read the Desjardins text (the one with the lower-frequency text-words but higher-frequency homophone errors). In other words, readers of the Black Queen should show less disruption in their eye movements when initially encountering an incorrect homophone relative to a correct one than should readers of the Desjardins text, because the incorrect homophones in the Black Queen were all the lower-frequency homophones.

Experiments 1a and 1b were designed to parallel Daneman and Reingold's (1993) Experiments 1 and 2, respectively. In Experiment 1a, half the target words appeared in their correct form, half as the contextually inconsistent homophone mate. In Experiment 1b, one third of the target words appeared in their correct form, one third as homophone errors, and one third as orthographically similar nonhomophone errors. The rationale for including an experiment without the nonhomophone control errors was the same as outlined by Daneman and Reingold (1993). The potentially easy-to-detect nonhomophone errors might draw attention to the manipulation and somehow interfere with natural reading; by excluding these error types in Experiment 1a, we could observe any possible phonological influences uncontaminated by the other errors. Like Daneman and Reingold (1993), we wanted to examine the pattern for homophone errors across texts that did and did not contain the nonhomophonic errors. And at the same time, by including a text condition that contained only the lower-frequency homophonic errors (The Black Queen text in Experiment 1a), we were giving phonology its maximum opportunity to camouflage the errors.

Method

Subjects.

The subjects were 50 University of Toronto undergraduates who were all fluent speakers of English; 20 participated in Experiment 1a, and the other 30 participated in Experiment 1b. Each subject was tested individually in a session lasting approximately 50 minutes. In each experiment, half the subjects were randomly assigned to read the story called The Black Queen; the other half read the story called The Desjardins.

Materials and Procedure.

The experimental manipulation involved 30 homophone word pairs with asymmetric word frequencies (e.g. hair/hare; feet/feat; real/reel). The mean Kucera and Francis (1967) frequency count for the higher-frequency member of the pair (e.g. hair, feet, real) was 336 occurrences per million (median = 162, SD =520); the mean frequency count for the lower-frequency member of the pair (e.g. hare, feat, reel) was 43 occurrences per million (median = 14, SD = 85). All 30 homophone pairs and their orthographically-matched controls are listed in Appendix A. Note that our frequency manipulation involved a manipulation of the relative word frequency of the text-word and the error replacement rather than a manipulation of their absolute frequency, and even our lower-frequency homophones were sufficiently common that readers would be likely to know their meaning and spelling (e.g. hair/hare, feet/feat, but not pigeon/pidgin, or bridal/bridle as in Jared & Seidenberg, 1991). Although the members of a homophone pair differed in word frequency, they were orthographically similar to one another in that each member of the pair was spelled with the same initial letter (e.g., hair-hare but not urn-earn), and each member of the pair was the same length as the other (e.g. hair-hare but not wade-weighed). The final criterion for selection was that for each homophone pair, a nonhomophonic control word could be created such that it shared the same consonant sounds as the correct word and its homophone mate, but differed only in the vowel sound (e.g., hire for hair/hare and fate for feet/feat). This consonant-same manipulation ensured that the homophonic and nonhomophonic error words had equal orthographic similarity to the correct word, and that the nonhomophonic control was as phonologically similar to the homophones as it could be (see Daneman & Stainton, 1991). Because the nonhomophone controls were matched for orthographic and phonological similarity rather than for frequency, the same nonhomophone control was used for both members of a homophone pair.3 The 30 higher-frequency homophones all appeared in (or were edited into) the short story The Black Queen by Barry Callaghan (1988); the 30 lower-frequency homophones all appeared in (or were edited into) the short story The Desjardins by Duncan Campbell Scott (1988). Both were engaging short stories, roughly equivalent in length to Daneman and Reingold's (1993) 1100 word Russell Wood text; the edited version of The Black Queen was 1150 words in length and the edited version of The Desjardins was 1350 words in length. In both stories, the 30 homophone target words were distributed across the entire story.

In Experiment 1a, subjects read a version of the Black Queen or Desjardins text in which 15 of the homophonic words appeared in their original contextually correct form, and 15 appeared as homophone errors. Counterbalancing of target words across the two word forms (correct word or homophone error) was accomplished by creating two versions of the Black Queen and Desjardins stories; a target word that appeared in its contextually correct form in the one version appeared as its homophone mate in the other. Each subject was randomly assigned to read one of the two error-filled versions of the Black Queen story or they were assigned to read one of the two versions of the Desjardins story, such that there were equal numbers of subjects reading each version.

In Experiment 1b, subjects read a version of the Black Queen or Desjardins text in which 10 of the target words appeared in their correct form, 10 as homophone errors, and 10 as nonhomophone errors matched for orthographic similarity to the correct word. Counterbalancing of target words was accomplished by creating three versions of the text; a target word appeared in a different form (correct word, homophone error, nonhomophone error) in each version. Each subject was randomly assigned to read one of the three error-filled versions of the Black Queen story or the Desjardins story, such that there were equal numbers of subjects reading each version.

The procedure was identical for both experiments. Subjects were told that they would be presented a short story on successive screens of a computer monitor. They were instructed to read the story silently at their own pace, making sure that they understood it well enough to answer questions about its content later. The text was displayed on a VGA monochrome monitor in conventional upper and lower case black font with white background. In the Black Queen condition, there were 22 screens of text, each containing no more than eight double-spaced lines of text; in the Desjardins condition, there were 31 screens of text. Subjects controlled the rate of presentation of each screen by pressing a start button to initiate presentation of the screen display and a stop button to remove it.

Subjects viewed the screen with their heads positioned in a chin rest (to minimize head movements). Viewing was binocular but only the position of the right eye was measured and recorded. Subjects' eye fixations were recorded by an Iscan (Model RK-416) video-based eye tracking system which calculated the x and y coordinates of the reader's point of regard every 16.7 milliseconds. A 386 IBM-compatible microcomputer was used to record the eye movement data as well as to display the stimulus text on the subject's monitor and on the experimenter's monitor. In addition to the stimulus text, the experimenter's monitor displayed the subject's gaze position in real time via an overlaid circular cursor measuring one degree of visual angle in diameter. This display enabled the experimenter to monitor the quality of the subject's calibration throughout the experiment so that a recalibration could be implemented during the course of the experiment if necessary. Prior to reading, a formal nine-point calibration procedure assured that the tracker was accurate to one-half degree of visual angle to either side of the reader's fixation center (an area subtended by approximately 1.1 characters of print). For an in-depth description of the calibration system and other features of our eye tracker's capabilities, see Stampe (1993).

After completing the eye-tracking phase, subjects were given two tests, one that tested their comprehension of the story they had read, and a second that tested their knowledge of how the 30 homophone pairs were spelled. Comprehension of the story was tested with ten questions of the following sort: "What are the names of the two men in the Black Queen story? "Which of the two men wore Cuban heels?" In Experiment 1a, the mean comprehension score was 8.05 out of a possible 10 (SD = 1.77) for the Black Queen text, and 7.85 (SD = 2.47) for the Desjardins text; in Experiment 1b, the mean comprehension scores were 8.33 (SD = 1.60) for Black Queen and 8.16 (SD = 1.54) for Desjardins. The reasonably high performance on the comprehension check indicated that readers had followed instructions to read for understanding. The purpose of the homophone spelling test was to ensure that any failure to detect a homophone error (e.g., hare substituted for hair, or hair substituted for hare) could not have been attributed to lack of knowledge about how the two different words were spelled. Subjects were given 30 fill-in-the-blank items of the following sort: (a) Her _____ was long enough to reach her knees (hare or hair); (b) Saving the woman's life was a daring _____ (feat or feet). In each case, their task was to circle which of the two words belonged in the sentence. For half the items, the correct word was the higher-frequency homophone; for the other half, the correct word was the lower-frequency homophone. A second version of the spelling test was created by constructing 30 new items, each of which required the opposite solution to its counterpart in the first version: (a) The _____ hopped through the forest (hare or hair); (b) Most people walk with their _____ (feat or feet) Subjects were randomly assigned to complete one of the two versions of the spelling test. In Experiment 1a, the mean score on the homophone spelling test was 29.30 out of a possible 30 (SD = 0.98); in Experiment 1b it was 29.68 (SD = 0.71); the almost perfect performance indicated that any failure to detect homophone errors could not have been attributed to lack of knowledge about how the 30 homophone word pairs were spelled.

Data Analysis.

Three dependent measures were used to determine whether an incorrect word was detected and error recovery processes initiated. They were (a) gaze duration on the target word; (b) total time on the target word; and (c) total repair time. An example from three readers' eye fixation protocols will illustrate how the three dependent measures were computed. Figure 1 shows the three readers' eye fixations while reading the phrase ...wore his hair/hare/hire a little.... In each case, the sequence of fixations is denoted by the successive numbers below the word being fixated, with the duration of each fixation (in milliseconds) indicated in parentheses below the associated fixation. (The duration of consecutive fixations on a word have been summed together; see Carpenter & Daneman, 1981; Just & Carpenter, 1980.) The gaze duration on the target word was simply the time spent fixating the target word when first encountered; for the reader who saw the hair version in Figure 1, gaze duration was 250 ms; for the reader who saw hare it was 334 ms; and for the reader who saw hire it was also 334 ms (the sum of fixations 3 and 4).4 Total time on the target word included the gaze duration plus any subsequent time spent in regressive fixations to it; for the reader of hair, total time was still 250 ms because hair was not refixated; for the reader of hare it was 551 ms (the sum of fixations 3 and 5); and for the reader of hire it was 767 ms (the sum of fixations 3, 4, and 7). Total repair time included all consecutive fixations, forward and regressive, from the first fixation on the target word up to but not including any fixations in advance of the target word once the reader resumed reading in a forward direction (that is, did not regress back to the target word or to any word preceding the target); for the reader of hair total repair time included only the 250 ms fixation on the target word because no regressions were initiated; for the reader of hare, total repair time was 968 ms (the sum of fixations 3 - 5); for the reader of hire, it was 1417 ms (the sum of fixations 3 - 9). Only 1.4 % of the trials yielded unusable data because of poor calibration or because the reader did not fixate the target word.

--------------------------------------

In draft, Figure 1 would appear here

--------------------------------------

Results and Discussion

Experiment 1a

Table 1 demonstrates the effect of homophone errors on the reader's eye fixation behavior. As seen in Table 1, all three dependent measures showed that readers took more time to process a homophone if it was inconsistent with the context, thereby providing strong evidence for the spontaneous detection of homophone errors during normal reading. The first dependent measure, gaze duration on the target word, provided evidence that the detection was immediate, that is, that it occurred when the reader first encountered the error, rather than later on, say at the end of a clause or sentence (see also Carpenter & Daneman, 1981; Daneman & Reingold, 1993; Just & Carpenter, 1980). As seen in Table 1, readers spent on average 346 ms when first encountering a homophone error as compared to only 259 ms fixating the correct word, an 88 ms difference that was highly significant across subjects, F1(1,18) = 35.48, MSe = 2113, p < .001, and across items, F2(1,29) = 28.91, MSe = 7580, p < .001, and presumably reflected the difficulty readers were experiencing integrating the inconsistent word (e.g., hare) with the prior text (McCrae wore his hare...).

In order to determine whether the homophone disruption effect generalized across high and low frequency homophones, we need to look at the relative size of the homophone disruption effect for the two stimulus passages. The main effect of passage (Black Queen versus Desjardins) was not significant (subject and item Fs < 1); however, a main effect would not be indicative of a standard frequency effect because each passage contained both high frequency and low frequency target words; in the case of the Black Queen text, the correct target words were the higher-frequency homophones (e.g., hair) and the error words were the lower-frequency homophones (e.g., hare substituted for hair); in the case of the Desjardins text, the correct target words were the lower-frequency homophones (e.g., hare) and the error words were the higher-frequency homophones (e.g. hair substituted for hare). Consequently, a main effect of passage (collapsed across target word type) would say nothing about word frequency effects per se, but would simply be indicative of an overall difference in passage difficulty. Of course, the purpose of this study was not to investigate word frequency effects per se; the purpose was to investigate whether word frequency has any influence on the relative ease of detecting homophone errors. To determine whether the relative frequency of the homophone error word had any effect on the size of the homophone disruption effect we need to look for possible interactions between passage (Black Queen versus Desjardins) and target word type (correct word, homophone error). For example, if phonology is more likely to be recruited in the activation of meanings for lower-frequency words, making lower-frequency homophone errors less conspicuous (less disruptive) than higher-frequency homophone errors, then the processing cost of an error (the difference in gaze durations for errors versus correct homophones) should be smaller in the Black Queen text than in the Desjardins text, because the error words in the Black Queen text were the lower-frequency homophones. As Table 1 shows, there was no evidence of an effect of frequency on the disruptiveness of homophone errors because the processing cost of an error was almost identical in the two texts; readers spent an additional 87 ms processing a lower-frequency homophone error than its contextually consistent mate in the Black Queen text (352 ms vs. 265 ms), and they spent an additional 86 ms processing a higher-frequency homophone error than its contextually consistent mate in the Desjardins text (339 ms vs. 253 ms), subject and item interaction Fs < 1. The lack of interaction between passage and target word type shows that the relative disruptiveness of the lower-frequency homophone errors in the Black Queen text was comparable to the relative disruptiveness of the higher frequency homophone errors in the Desjardins text, thus suggesting that the easy detection of homophone errors generalized across low frequency and high frequency words.

The same pattern of results was evident for the dependent measures that included regressive fixations. As seen in Table 1, total time spent on the target word was much greater for incorrect homophones (M = 560 ms) than for correct ones (M = 297 ms), F1(1,18) = 43.47, MSe = 15823, p < .001, and F2(1,29) = 86.40, MSe = 23653, p < .001; and total repair time was also much greater for incorrect homophones (M = 1158 ms) than for correct homophones (M = 415 ms), F1(1,18) = 29.15, MSe = 189143, p < .001, and F2(1,29) = 48.89, MSe = 341847, p < .001. And again, as was the case for the gaze duration data, there was no main effect of passage type (subject and item Fs < 1); there was also no passage x target word type interaction was not significant for either dependent measure (all subject and item Fs < 1), a finding which suggests that the processing cost for the lower-frequency homophone errors in the Black Queen text was no different than the processing cost for the higher-frequency homophone errors in the Desjardins text. Although we have no overt measure of whether or not readers successfully repaired the inconsistency, the lack of difference in the time spent reading and rereading phrases containing lower-frequency homophone errors versus higher-frequency homophone errors, suggests that readers could recover from low frequency and high-frequency error words with equivalent ease.

-----------------------------------

In draft, Table 1 would appear here

-----------------------------------

Experiment 1a ruled out the possibility, albeit an unlikely one, that readers would fail to detect all "sound-okay" errors during normal silent reading for comprehension, even if the errors were all relatively low frequency words as in the Black Queen text. Instead, Experiment 1a provided strong evidence that readers are able to detect at least some substantial proportion of the low frequency and high frequency homophonic errors (eg. hare in McCrae wore his hare a little too long... and hair in ...a few muskrats and a hair built a house in the marsh field), a finding which suggests that readers pay attention to orthographic codes in determining the meanings for words. However, phonological processes would still be implicated if homophonic errors (e.g., hare or hair) went unnoticed more frequently than orthographically matched nonhomophonic errors (e.g., hire), or if there was a difference in recovery time for the homophone versus nonhomophone errors. This possibility was tested directly in Experiment 1b.

Experiment 1b

Table 2 demonstrates the effect of the homophone and nonhomophone errors on the reader's eye fixation behavior. As seen in Table 2, the data closely replicated Daneman and Reingold's (1993) data for same-length homophone pairs and showed that they generalized across higher frequency and lower frequency homophonic error words. In a nutshell, the data revealed that the Black Queen and Desjardins readers initially experienced as much difficulty when encountering a homophonic error as a nonhomophonic one; however, homophony facilitated the recovery process.

-----------------------------------

In draft, Table 2 would appear here

-----------------------------------

Initial detection: The gaze durations revealed a significant main effect for type of target word, F1(2,56) = 19.15, MSe = 1746, p < .001, and F2(2,58) = 9.25, MSe = 6917, p < .001. As seen in Table 2, readers spent on average 269 ms when first encountering a contextually consistent word, whereas they spent an additional 47 ms on a homophone error and an additional 63 ms on a nonhomophone error. Pair-wise t-tests showed that the main effect could be attributed to the difference between processing a semantically consistent target word versus a semantically inconsistent one. Readers initially took longer to process a homophone error than its contextually correct homophone mate, a finding that was significant across subjects t(29) = 4.58, p < .001, and items, t(29) = 3.14, p < .01; similarly, they took longer to process a nonhomophone error than the contextually correct word, subject t(29) = 6.42, p < .001, and item t(29) = 4.74, p < .001. However, there was no difference in initial processing time for homophone versus nonhomophone errors, subject t(29) = 1.37, p > .17, item t(29) =1.01, p > .30, a result which replicates the Daneman and Reingold (1993) finding that homophonic errors were as disruptive as nonhomophonic errors, and suggests that homophonic errors were detected as easily.

As in Experiment 1a, the gaze duration data showed that the early detection of homophone errors was not affected by word frequency. There was no significant interaction between target word type and passage (subject and item ps > .15); that is, the lower-frequency Black Queen homophone errors were as disruptive as the higher-frequency Desjardins homophone errors, suggesting that both types of homophone errors were detected as easily as the nonhomophone errors. In contrast to the conclusions based on proofreading responses (Daneman & Stainton, 1991), the results from our on-line reading measure provide no evidence for a homophone interference effect. Like Daneman and Reingold (1993), we take this lack of phonological interference in the early detection of homophonic errors as evidence against those models that assume phonological sources of activation invariably mediate lexical access (Daneman & Stainton, 1991; Van Orden, 1987).

Our data are also inconsistent with those of Pollatsek at al. (1992) who used an eye movement parafoveal previewing paradigm to argue for the early involvement of phonology in word identification (see also Rayner et al., in press). In the Pollatsek et al. (1992) study, processing of a target word (e.g., beech) was facilitated if a homophone of that target word (e.g., beach) had been presented as a preview in the parafovea more so than if a visually similar control (e.g., bench) had been presented as a preview in the parafovea. The Pollatsek et al. data are inconsistent with our data because they suggest that phonological codes are activated very early in the word identification process, even before the word in question is fixated foveally. Interestingly, however, the advantage of a homophone preview over a visually similar preview was only statistically reliable for the first fixation on the target word but not for the gaze duration on the target word (see Pollatsek et al., 1992, Experiment 2). We would like to suggest that even if the previewing paradigm has provided evidence for a very early (nonlexical) involvement of phonology in the word encoding process, it has not provided evidence for the involvement of phonological codes in the subsequent process of accessing a word's meaning. Based on the evidence from our naturalistic inconsistency detection paradigm, we would argue against models of word identification that implicate phonology in the process of lexical access.

Error Recovery: Although there was no evidence for the early engagement of phonological processes in lexical access, there was evidence for the delayed involvement of phonology in error recovery. When the post-detection error-recovery fixations were included in the analysis, a homophone effect finally emerged. In the interests of brevity, only the results for total repair time will be presented in detail because the other measure of recovery time, total time on the target word, produced the same pattern of results. The analysis of variance (ANOVA) on total repair time revealed a significant main effect of type of target word, F1(2,56) = 47.48, MSe = 171952, p < .001, and F2(2,58) = 58.47, MSe = 273354, p < .001. The Black Queen text appeared to be easier to comprehend than the Desjardins text in that time spent on all three kinds of Black Queen target words was shorter than time spent on the corresponding Desjardins target words, F1(1,28) = 3.79, MSe = 513076, p < .063, and F2(1,29) = 11.49, MSe = 316902, p < .01. (The passage difficulty effect can be seen most clearly for the nonhomophone control errors because they are the identical words in the two texts--for example, hire which replaces hair in Black Queen and hare in Desjardins; readers spent an average of 1210 ms in repair time on an inconsistent phrase containing a nonhomophone error in the Black Queen text, but 1554 ms on an inconsistent phrase containing the same nonhomophone error in the Desjardins text.) However, most important for the purposes of this study, there was no interaction between type of target word and passage (subject and item Fs < 1), suggesting that the processing cost of homophone errors was consistent across low frequency (Black Queen) and high frequency (Desjardins) errors. As was the case for the gaze duration data, pair-wise t-tests showed that readers took longer to process both kinds of semantically inconsistent target words relative to the semantically consistent ones. Readers took 780 ms longer in repair time for a homophone error than for its contextually correct homophone mate, subject t(29) = 8.34, p < .001, item t(29) = 8.74, p < .001, Readers took 989 ms longer in repair time for a nonhomophone error than for the contextually correct word, subject t(29) = 8.05, p < .001, item t(29) = 11.55, p < .001. However, unlike the gaze duration data which showed no difference in initial processing time for homophone versus nonhomophone errors, the repair time data revealed that readers spent significantly less time (209 ms) repairing a homophonic error than a nonhomophonic one, subject t(29) = 3.87, p < .001, and item t(23) = 3.31, p < .001, a finding which replicates the Daneman and Reingold (1993) pattern. Of course we have no overt measure of whether our readers successfully resolved the inconsistency by recovering the correct word; however, the shorter time spent reading and rereading phrases containing homophonic errors (e.g., McCrae wore his hare...) relative to phrases containing the orthographically-matched nonhomophonic errors (e.g., McCrae wore his hire...), strongly implies that readers were able to recover from the "sound-okay" errors more easily. Because the two kinds of errors (e.g., hare, hire) were equated for their orthographic similarity to the correct word (e.g., hair), but only the former (e.g., hare) shared the same phonological representation as the correct word (e.g., hair), we attribute the facilitation effect to the availability of the shared phonology /h?r/ and its usefulness in providing readers with a route to recovering the correct alternative. Daneman and Stainton (1991) showed that homophony facilitated error recovery when readers were explicitly asked to repair the errors they encountered while reading. Our replication of this homophone facilitation effect shows that readers initiate error recovery heuristics spontaneously during normal reading for meaning.

Conclusions.

The results of Experiments 1a and 1b replicated the Daneman and Reingold (1993) results and showed that they generalize to new stimulus texts and to texts that have a much lower density of error words. Like Daneman and Reingold (1993), we interpret our data as support for a theory of lexical access in which phonological sources of activation and influence are delayed relative to orthographic sources (see also Coltheart, 1978; McCusker, et al., 1981; McCutchen & Perfetti, 1982) rather than a theory in which phonological codes play an early and important role in activating word meanings (e.g., Daneman & Stainton, 1991; Inhoff & Topolski, in press; Pollatsek et al., 1992; Rayner, et al., in press; Van Orden, 1987).

On the issue of whether the results generalize across low frequency and high frequency words, we need to be a little cautious in our conclusions. There are several aspects to the data that suggest caution. First of all, we did not get a standard word frequency effect for correct target words in Experiment 1a; as seen in Table 1, subjects spent 265 ms in gaze duration on the high frequency correct homophones (the ones in Black Queen), whereas they spent only 253 ms on the low frequency correct homophones (those in Desjardins). Although the standard word frequency effect for correct homophones was obtained in Experiment 1b (Table 2 shows a 38 ms advantage for the high frequency correct homophones), the lack of consistency across experiments indicates that our word frequency manipulation may not have been as powerful as it could be. In other words, skeptics might argue that had we included even lower frequency homophones, we may have shown a pattern of results that supported the early engagement of phonological processes in lexical access, namely, one in which readers failed to detect the very low frequency homophonic errors as easily as they detected the nonhomophonic errors. And indeed, even though our data showed no statistical support for any effect of relative word frequency on the initial detection of homophone errors, there was a hint (albeit a nonsignificant one) that the lower frequency homophone errors in Experiment 1b were less disruptive and therefore more difficult to detect than were their orthographic controls (294 ms versus 330 ms; see Table 2). These trends suggest two directions for future research: the inclusion of lower frequency homophones than the ones used in this study; and the inclusion of low frequency homophones that have low frequency mates (e.g. pigeon/pidgen; bridal/bridle). Rather than manipulating relative word frequency as was done here, word frequency could be manipulated orthogonally to include all four kinds of homophone pairs: (1) low-frequency homophone error/contextually correct high frequency homophone mate; (2) low-frequency homophone error/low frequency homophone mate; (3) high-frequency homophone error/high frequency homophone mate; (4) high-frequency homophone error/low frequency homophone mate (see for example, Jared & Seidenberg, 1991). A comprehensive design such as this would allow one to investigate frequency's effect on phonological activation more fully.

Of course, there would be need for caution in interpreting data from designs that included very rare or infrequent homophones. Even if we designed an experiment to include such low frequency words and found that the low-frequency homophone errors were detected less readily than spelling controls, we would be reluctant to interpret the result as evidence for the early engagement of phonology, because a plausible alternative is that subjects were not familiar enough with the way the infrequent homophone or its contextually correct mate were spelled, and so "passed over" the homophone error because they took it to represent the contextually correct meaning.

The shortcomings of our word frequency manipulation notwithstanding, it is worth emphasizing that our stimulus pool of homophonic words did represent a wide range of word frequency counts (1 to 2470 occurrences per million according to Kucera & Francis, 1967, norms), and our item analyses did show that the immediate disruptiveness of homophone errors was highly significant across the entire pool of items. Consequently, at least for the range of words included here, we feel relatively confident in concluding that our on-line error detection results do not depend on the frequency of occurrence of the error word.

Experiment 2

The on-line reading data (Experiments 1a and 1b; Daneman & Reingold, 1993) are at odds with the proofreading data (Daneman & Stainton, 1991) on the issue of the time course of phonological activation during silent reading. Because the on-line reading data show that homophony does not interfere with the initial detection of contextually inconsistent homophonic errors, they suggest that phonology does not play a role in the initial activation of word meanings. Because the proofreading data show that homophony can interfere with the detection of homophonic errors, they suggest that phonology may play a significant and early role in the activation of word meanings. We consider the on-line data to be more compelling than the proofreading data. The on-line data represent evidence that is revealed spontaneously during the moment-to-moment processes of natural reading rather than evidence that is inferred indirectly from a secondary task. Consequently, we believe that the on-line data are more likely to provide a valid and sensitive index of the time course of phonological activation during the reading of normal connected text. In Experiment 2, we attempted to provide direct empirical evidence for this claim by collecting the eye fixation data in conjunction with an explicit proofreading task.

Method

Subjects.

The subjects were 30 University of Toronto undergraduates who had not participated in Experiments 1a and 1b. Each subjects was tested individually in a session lasting approximately 50 minutes.

Materials and Procedure.

The experimental manipulation involved the same 1100 word Russell Wood text used in Daneman and Reingold's (1993) Experiment 2; the text contained 48 target words, 16 that appeared as the correct homophone, 16 as the contextually inconsistent homophone mate, and 16 as a nonhomophone control error matched for orthographic similarity to the correct word. Counterbalancing of target words was accomplished by creating three versions of the text; a target word appeared in a different form (correct word, homophone error, nonhomophone error) in each version. Each subject was randomly assigned to read one of the three error-filled versions. As in the Daneman and Reingold (1993) versions, half of the homophone errors shared considerable spelling similarity with their contextually correct homophone mates because they were the same length as them (e.g. hare which is the same length as hair; board which is the same length as bored); the other half were less similarly spelled because they were a different length than their contextually correct homophone mates (e.g. wade which is shorter than weighed, none which is longer than nun). The complete list of target words and their homophonic and nonhomophonic error forms is provided in Appendix B.

As in Daneman and Reingold (1993), the text was displayed on a VGA monochrome monitor in conventional upper and lower case black font with white background. In all, there were 20 screens of text, each containing no more than eight double-spaced lines of texts. The procedure and equipment for recording eye movements were identical to those used in the Daneman and Reingold (1993) studies and in Experiments 1a and 1b in the present study. The only procedural difference was in the instructions given to subjects. Remember that in the Daneman and Reingold (1993) experiments and in Experiments 1a and 1b of this study, subjects were simply instructed to read for comprehension. In Experiment 2, subjects were explicitly asked to proofread for inconsistent words as they read. Subjects were told that in certain places in the story an original word had been substituted with an incorrect word. Whenever they came across a word that did not make sense in the context, they were to press the response button on the button box positioned comfortably on their laps. No information on the number or distribution of incorrect words was provided to the subject.

Even though Daneman and Reingold (1993) found no difference in eye movement patterns as a function of whether or not readers had been exposed to an intact error-free version of the Russell Wood text beforehand, we included the same familiarization manipulation in the present study; this meant that the two experiments were procedurally identical except for the no-proofreading versus proofreading dimension. The familiarization manipulation was conducted prior to the eye-tracking phase. As in Daneman and Reingold (1993), half of the subjects were given an error-free print-out of the Russell Wood story and asked to read it silently for comprehension; the other half were not familiarized on the error-free version first.

After completing the eye-tracking phase, subjects were given the same comprehension test and homophone spelling tests as administered by Daneman and Reingold (1993). The mean comprehension score was 8.26 out of a possible 10 (SD = 2.12); the reasonably high performance suggested that subjects were comprehending the story well. The mean spelling score on the forced-choice spelling test was 47.57 out of a possible 48 (SD = 0.50); the almost perfect performance indicated that, at least in a forced-choice situation, subjects were highly accurate at differentiating between the two spellings of homophonic words.

Data Analysis.

As in our earlier studies, the same three dependent measures were used as indices of on-line error detection and attempts at error recovery: (a) gaze duration on the target word; (b) total time on the target word; and (c) total repair time. In addition, button-pressing responses in the vicinity of the target word were used as an index of explicit error detection. Because target words were sufficiently far apart in the text, it was easy to assign a button-press to a particular target word even if the subject had progressed several words beyond the target word before pressing the error-detection button. False alarms were not a problem because subjects virtually never pressed the error-detection button when the target word was correct (false alarm rate for correct words was .02; see Table 3), and they never pressed the error-detection button when the screen of text contained no target words.

Results and Discussion

Table 3 provides the error detection responses and eye fixation data for Experiment 2. As in Daneman and Reingold (1993), the data have been averaged across the two familiarization conditions (familiarized and unfamiliarized) because preliminary analyses showed that familiarization did not influence the eye fixation patterns or interact with any of the experimental manipulations (all main effect and interaction Fs < 1). Familiarization also had no effect on error detection responses (all main effect and interaction Fs < 1).5

-----------------------------------

In draft, Table 3 would appear here

-----------------------------------

Error Detection Responses.

If the proportion of button-pressing responses is taken as an index of the proportion of errors detected, the results suggest that homophony interfered with error detection. As seen in Table 3, on average subjects made an error detection responses for .02 of the correct words (these were the false alarms), whereas they made the error response for .60 of the homophone errors and .68 of the nonhomophone errors, F1(2,58) = 178.93, MSe = 0.04, p < .001, and F2(2,92) = 220.38, MSe = 0.03, p < .001. Pair-wise t-tests showed that all differences were significant. Readers made fewer error responses to the correct words than to the homophone errors (subject t(29) = 14.69, p < .001, and item t(47) = 3.90, p < .001); they made fewer error responses to the correct words than to the nonhomophonic errors (subject t(29) = 16.63, p < .001, and item t(47) = 20.67, p < .001); and they made fewer error responses to the homophone errors than to the nonhomophone errors (subject t(29) = 2.14, p < .04, and item t(47) = 2.12, p < .05. The effect was consistent across same-length and different-length pairs (subject and item interaction Fs < 1), thus suggesting that readers responded to both kinds of homophone errors less frequently than they did to nonhomophone errors. These data are consistent with Daneman and Stainton's (1991) proofreading data which also showed that subjects made fewer overt error detection responses to the "sound-okay" errors (although note that our effect was present whether or not the reader was familiarized on an error-free version of the text before proofreading). The absence of an overt response is usually assumed to indicate that the error has been missed. If we were to make this assumption then we would conclude that our homophone errors were detected less easily than our nonhomophone errors, and that phonology is used to activate word meanings. This interpretation would of course be at odds with the interpretation based on the eye fixation data reported in Experiment 1 and in Daneman and Reingold (1993), because those data argued against the early engagement of phonological processes. But we are not going to make this assumption. In fact, we will use the Experiment 2 data to caution against assuming that the absence of an overt error-detection response (here a button-press) necessarily indicates that the error has been missed. Unlike the Daneman and Stainton (1991) task which depended on the overt proofreading responses to make inferences about phonological processing, this experiment collected eye fixation data in conjunction with the overt responses, and the eye fixation data caution against equating the reader's failure to make an overt error detection response with the reader having failed to detect the presence of the error word.

Eye Fixation Data.

A comparison of the eye fixation durations in Table 3 with those in Daneman and Reingold (1993) show that overall reading was much slower under explicit instructions to proofread for inconsistent words (Experiment 2) than under simple instructions to read for comprehension (Daneman & Reingold, 1993). Presumably the additional time taken to read the Russell Wood passage in Experiment 2 reflected a variety of processes involved in deliberately searching for errors, deciding on the appropriate response, and executing a detection response. But slower reading aside, the pattern with respect to homophone errors resembled the Daneman and Reingold (1993) pattern in several crucial ways.

Initial Detection: Although the button-pressing data showed that readers made an overt error detection response less frequently for homophone errors that for nonhomophone errors, the eye fixation data replicated the Experiment 1 findings and the Daneman and Reingold (1993) findings by showing that homophone errors were as disruptive as nonhomophone errors when initially encountered. An ANOVA on the gaze durations showed a significant main effect for target word type, F1(2,58) = 19.13, MSe = 11548, p < .001, and F2(2,92) = 12.90, MSe = 13856, p < .001. As seen in Table 3, readers spent on average 304 ms when first encountering a contextually consistent homophone, whereas they spent an additional 99 ms on a homophone error and an additional 111 ms on a nonhomophone error. Pair-wise t-tests showed that the main effect could be attributed to the difference between processing a semantically consistent target word versus a semantically inconsistent one. Readers initially took longer to process a homophone error than its contextually correct homophone mate, subject t(29) = 5.85, p < .001, and item t(47) = 4.25, p < .001; similarly, they took longer to process a nonhomophone error than the contextually correct word, subject t(29) = 5.24, p < .001, and item t(47) = 5.08, p < .001. However, there was no difference in initial processing time for homophone versus nonhomophone errors, subject t(29) = 0.72, p > .47, and item t(47) = 0.54, p > .58, a result which shows that homophonic errors were as disruptive as nonhomophonic errors, and suggests that homophones were being detected as easily. As in the Daneman and Reingold (1993) study, the findings were consistent across same-length (e.g., board/bored) and different-length homophones (e.g. wade/weighed), thus suggesting that both types of homophone errors were detected as easily as the nonhomophone errors.

Our way of reconciling the eye fixation data with the error detection responses is to argue that readers were detecting the homophone errors as readily as the nonhomophone errors, but were more reluctant to commit themselves to a detection response in the case of the homophone errors, perhaps because they began to question their own ability to differentiate between the two homophone spellings. (We are talking here of a response criterion problem that is eliminated in the kinds of simple forced-choice spelling tests we give our subjects.) Although we do not have direct evidence for this interpretation, we do have evidence that readers did not reliably make the overt error detection response whenever they detected the presence of an inconsistent word. This evidence comes from a contingency analysis, in which we analyzed gaze durations for the homophone versus nonhomophone errors as a function of whether or not subjects made an overt error detection response.6 As seen in Table 4, subjects spent less time on an error word when they did not make an overt error detection response than when they did make the response, F1(1,26) = 19.41, MSe = 11383, p < .001, F2(1,41) = 3.15, MSe = 24888, p < .09; however, homophone errors were as disruptive as nonhomophone errors (subject and item Fs < 1), and type of error word did not interact with whether or not they made the overt response (subject and item interaction ps > .30). Thus the same initial disruptiveness of homophone errors relative to nonhomophone errors appears to be present regardless of whether or not readers made an overt detection response. Taken alone, the button-pressing proofreading response would lead to the conclusion that homophony interferes with error detection. The degree of disruption revealed by the eye fixations suggests this isn't so.

-----------------------------------

In draft, Table 4 would appear here

-----------------------------------

Error Recovery: The post-detection error recovery data (1) corroborated the earlier eye movement studies by providing evidence for the delayed involvement of phonological codes in the recovery process, and (2) provided further support for the position that readers do not reliably make an overt error detection response whenever they detect the presence of an error word.

As in the earlier eye movement studies, evidence for the delayed activation of phonological codes came in the form of a homophone facilitation effect; readers spent less time reading and rereading phrases containing homophonic errors than nonhomophonic ones, presumably because they could exploit the shared phonology of homophones as a retrieval route to the correct meaning. The only difference between the Experiment 2 data and the Daneman and Reingold data was that the facilitation effect was present for the same-length homophones (e.g. board/bored) as well as the different-length homophones (e.g., wade/weighed) in Experiment 2, whereas it was only present for the same-length homophones in the Daneman and Reingold (1993) study. Again in the interests of brevity, only the results for total repair time will be presented in detail because the other measure of recovery time, total time on the target word, produced the same pattern of results. The ANOVA on total repair time revealed significant main effects of type of target word, F1(2,58) = 99.86, MSe = 413048, p < .001, and F2(2,92) = 104.22, MSe = 318764, p < .001 and length similarity, F1(1,29) = 29.42, MSe = 158178, p < .001, and F2(1,46) = 11.56, MSe = 298684, p < .001, as well as a significant target word x length-similarity interaction, F1(1,29) = 5.66, MSe = 194756, p < .01, F2(2,92) = 2.73, MSe = 318764, p < .07.7 As was the case for the gaze duration data, pair-wise t-tests showed that readers took longer to process both kinds of semantically inconsistent target words relative to semantically consistent ones. Readers took longer in repair time for a homophone error than for its contextually correct mate: 879 ms longer if it was the same length as its mate, subject t(29) = 7.14, p < .001, and item t(23) = 9.49, p < .001, and 1259 ms longer if it was a different length, subject t(29) = 8.03, p < .001, and item t(23) = 9.17, p < .001. Similarly, readers took longer in repair time for a nonhomophone error than for the contextually correct word: 1371 ms for same-length pairs, subject t(29) = 11.33, p < .001, and item t(23) = 19.45, p < .001, and 1895 ms for different-length pairs, subject t(29) = 11.28, p < .001, and item t(23) = 9.30, p < .001. However, unlike the gaze duration data which showed no difference in initial processing time for homophone versus nonhomophone errors, the repair time data revealed that readers spent significantly less time repairing a homophonic error than a nonhomophonic error: 492 ms less if the homophone error was the same length as its contextually correct mate, subject t(29) = 4.15, p < .001, and item t(23) = 5.14, p < .001, and 636 ms less if it was a different length than its mate, subject t(29) = 4.02, p < .001, and item t(23) = 2.26, p < .05. Like Daneman and Reingold (1993), we interpret the homophone facilitation effect as evidence that readers have delayed access to the phonological codes and that they can exploit the shared phonological codes to repair inconsistencies arising from homophonic words. In the Daneman and Reingold (1993) study, the ability to exploit the phonological codes appeared to be somewhat limited, because the orthographic dissimilarity between different-length homophone pairs (e.g., wade/weighed; none/nun) eliminated the homophone facilitation effect, a result which suggested that readers still pay considerable attention to orthographic sources of information as they attempt to interpret and reinterpret inconsistent words. In the present study, the homophone facilitation effect was apparent even for the orthographically dissimilar different-length pairs. We think that the slower more deliberate reading that is an integral part of the proofreading-while-reading paradigm is responsible for the more widespread homophone facilitation. The more time readers spend engaged in deliberate error-recovery heuristics, the more opportunity they will have to gain access to the phonological codes and to exploit them during error recovery.

The error-recovery data also showed that readers spent considerable time trying to repair an inconsistency even when they did not make an overt error-detection response. As Table 4 shows, readers did spend more time reading and rereading inconsistent phrases when they made an overt error detection response than when they did not, F1(1,26) = 27.37, MSe = 821597, p < .001, and F2(1,41) = 37.49, MSe = 1203539, p < .001. However, they spent less time on the homophone errors than on the nonhomophone errors, F1(1,26) = 22.06, MSe = 530900, p < .001, and F2(1,41) = 6.14, MSe = 1042174, p < .02, and the size of this facilitation effect was the same whether or not readers made the overt response (subject and item interaction Fs < 1). As the considerably lengthy repair times in Table 4 indicate, it would be difficult to argue that readers had not detected the inconsistent homophone (or the inconsistent nonhomophone for that matter), even when they did not make the overt error detection response. For those instances in which readers failed to make an overt error detection response, they spent on average 1187 ms reading and rereading an inconsistent homophonic phrase, and they spent 1631 ms reading and rereading an inconsistent nonhomophonic phrase. Because these repair times are considerably longer than the average 636 ms spent on the corresponding contextually consistent phrases (see Table 3),8 they suggest that readers were not only often disrupted by the inconsistency, but they also engaged in a deliberate and lengthy attempt to resolve the inconsistency. The fact that readers dwelled less on the inconsistent homophonic phrases (e.g., Alone at his teller's cage, idle and board...) than on the inconsistent nonhomophonic ones (e.g., Alone at his teller's cage, idle and beard...) is probably attributable to the fact that they could use the shared phonological code (e.g., /b?rd/) to recover the contextually consistent meaning (e.g., bored meaning "disinterested"). However, recovery of the contextually consistent meaning would not necessarily translate into an overt error detection response. Indeed, we believe that readers may have become reluctant to commit themselves to an overt error detection response for homophonic errors because the very activation of two spellings (board and bored) and two meanings ("plank" and "disinterested") may have reduced their confidence in having the ability to judge which spelling belonged to which meaning, and therefore reduced their confidence in making the decision as to whether the text spelling was correct or incorrect. We do not have direct evidence for this claim. However, our data do caution against equating the reader's failure to make an overt error detection response with the reader having failed to detect the presence of the error word when first encountering it. And if there is any reason to believe that the criterion for making an error detection response differs for homophonic versus nonhomophonic words, then it is risky to interpret fewer overt proofreading responses to homophone errors than to nonhomophone errors as evidence that the homophone errors were noticed less often. In any event, the main contribution of Experiment 2 is that it cautions us against basing our conclusions about phonology on a subsidiary proofreading response signalled by a button-press, and it shows that the on-line reading time data expose the time course of phonological activation during normal reading in a way that the subsidiary proofreading measure is not sensitive enough to do. Our eye fixation data have consistently shown that homophonic errors are initially as disruptive as nonhomophonic errors, and that phonology has its effects only after the word has been identified and the reader is having difficulty integrating its meaning with the preceding context.

Conclusions

This study has shown how eye fixation data can be used to reveal the time course of phonological processes during the silent reading of natural connected prose. The study makes two important contributions to the literature on phonological coding and reading, one theoretical and the other methodological.

On the theoretical side, the eye fixation data provide compelling evidence that phonological sources of activation do not mediate lexical access. Whereas data based on more indirect tasks (e.g., Daneman & Stainton, 1991; Van Orden, 1987) have supported a theory of lexical access in which phonological codes play an early and dominant role, our eye fixation data call for a more restricted and delayed involvement of phonology by showing that homophonic errors (e.g. Alone at his teller's cage , idle and board...) were initially as disruptive as nonhomophonic errors (e.g., Alone at his teller's cage, idle and beard....), with homophony only making a difference in the post-detection lexical access error recovery processes (see also Daneman & Reingold, 1993). The results do not appear to be dependent on word frequency, because Experiment 1 showed that readers are as likely to bypass phonology in activating the meanings of low frequency words (e.g., bored) as they are when accessing the meanings of high frequency words (e.g., board). However, further studies should manipulate word frequency using a more complete design.

On the methodological side, Experiment 2 cautions against making inferences about phonological processes from an indirect proofreading response. By collecting eye movement data in conjunction with an explicit proofreading task, we showed that button-pressing proofreading responses are unreliable indices of error detection because even when readers fail to make an overt error detection response, their eye fixations reveal that they have detected an error. Taken alone, the button-pressing proofreading data would lead to the conclusion that phonology is used to activate word meanings because they showed that readers were less likely to make an overt detection response in the presence of homophonic errors as in the presence of nonhomophonic errors. However, the eye fixations revealed the same degree of disruption to homophonic errors as nonhomophonic errors, leading to the conclusion that orthography rather than phonology is used to activate word meanings. If a secondary response is an unreliable index of the time course of phonological activation during reading, it may be an unreliable index of other components of the reading process too.

References

Baddeley, A.D. (1979). Working memory and reading. In P.A. Kolers, M.E. Wrolstad, & H. Bouma (Eds.), Processing of Visible Language (pp. 355-370). New York: Plenum Press.

Baddeley, A.D., Eldridge, M., & Lewis, V. (1981). The role of subvocalization in reading. Quarterly Journal of Experimental Psychology, 33A, 439-454.

Besner, D. (1987). Phonology, lexical access in reading, and articulatory suppression: A critical review. Quarterly Journal of Experimental Psychology, 39A, 467-478.

Callaghan, B. (1988). The Black Queen. In M. Atwood & R. Weaver (Eds.), The Oxford book of Canadian short stories (pp. 305-307). Oxford: Oxford University Press.

Carpenter, P.A. & Daneman, M. (1981). Lexical retrieval and error recovery in reading: A model based on eye fixations. Journal of Verbal Learning and Verbal Behavior, 20, 137-160.

Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed.), Strategies in information processing (pp. 151-216). London: Academic Press.

Coltheart, M., Davelaar, E., Jonasson, J.T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and Performance VI (pp. 534-555). New York: Academic Press.

Coltheart, V., Avons, S.E., & Trollope, J. (1990). Articulatory suppression and phonological codes in reading for meaning. Quarterly Journal of Experimental Psychology, 42A, 375-399.

Coltheart, V., Laxon, V., Rickard, M., & Elton, C. (1988). Phonological recoding in reading for meaning by adults and children. Journal of Experimental Psychology: Learning, Memory and Cognition, 14, 387-397.

Daneman, M., & Reingold, E. (1993). What eye fixations tell us about phonological recoding during reading. Canadian Journal of Experimental Psychology, 47, 153-178.

Daneman, M., & Stainton, M. (1991). Phonological recoding in silent reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 618-632.

Doctor, E.A., & Coltheart, M. (1980). Children's use of phonological encoding when reading for meaning. Memory & Cognition, 8, 195-209.

Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178- 210.

Inhoff, A.W. (1984). Two stages of word processing during eye fixations in the reading of prose. Journal of Verbal Learning and Verbal Behavior, 23, 612-624.

Inhoff, A.W., & Topolski, R. (in press). Use of phonological codes during eye fixations in reading and in on-line and delayed naming tasks. Journal of Memory and Language.

Jared, D., & Seidenberg, M.S. (1991). Does word identification proceed from spelling to sound to meaning? Journal of Experimental Psychology: General, 120, 358-394.

Johnston, R.S., Rugg, M.D., & Scott, T. (1987). The influence of phonology on good and poor readers when reading for meaning. Journal of Memory and Language, 26, 57-68.

Just, M.A., & Carpenter, P.A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 4, 329-354.

Katz, L. & Frost, R.(1992).The reading process is different for different orthographies: The orthographic depth hypothesis. In R. Frost and L. Katz (Eds.), Orthography, phonology, morphology, and meaning. North Holland: Elsevier Science Publishers.

Kucera, H., & Francis, W.N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

McCusker, L.X., Hillinger, M.L., & Bias, R.G. (1981). Phonological recoding and reading. Psychological Bulletin, 89, 217-245.

McCutchen, D., & Perfetti, C.A. (1982). The visual tongue-twister effect: Phonological activation in silent reading. Journal of Verbal Learning and Verbal Behavior, 21, 672-687.

Patterson, K., & Coltheart, V. (1987). Phonological processes in reading: A tutorial review. In M. Coltheart (Ed.), Attention and Performance XII: The psychology of reading (pp. 421-447). Hove: Erlbaum.

Perfetti, C.A., Bell, L.C., & Delaney, S.M. (1988). Automatic (prelexical) phonetic activation in silent word reading: Evidence from backward masking. Journal of Memory and Language, 27, 59-70.

Pollatsek, A., Lesch, M., Morris, R.K., & Rayner, K. (1992). Phonological codes are used in integrating information across saccades in word identification and reading. Journal of Experimental Psychology: Human Perception and Performance, 18, 148-162.

Rayner, K., Sereno, S.C., Lesch, M.F., & Pollatsek, A. (in press). Phonological codes are automatically activated during reading: Evidence from an eye movement priming paradigm. Psychological Science.

Rubenstein, H., Lewis, S.S., & Rubenstein, M.A. (1971). Evidence for phonemic recoding in visual word recognition. Journal of Verbal Learning and Verbal Behavior, 10, 645-657.

Scott, D.C. (1988). The Desjardins. In M. Atwood & R. Weaver (Eds.), The Oxford book of Canadian short stories (pp. 24-28). Oxford: Oxford University Press.

Seidenberg, M.S. (1985). The time course of information activation and utilization in visual word recognition. In D. Besner, T.G. Waller, & G.E. MacKinnon (Eds.), Reading research, Vol. 5,(pp. 199-252). Orlando: Academic Press.

Stampe, D. (1993). Heuristic filtering and reliable calibration methods for video-based pupil tracking systems. Behavior Research Methods, Instruments, & Computers, 25, 137-142.

Treiman, R., Freyd, J., & Baron, J. (1983). Phonological recoding and use of spelling-sound rules in reading of sentences. Journal of Verbal Learning and Verbal Behavior, 22, 682-700.

Van Orden, G.C. (1987). A ROWS is a ROSE: Spelling, sound and reading. Memory & Cognition, 15, 181-198.

Van Orden, G.C., Johnston, J.C., & Hale, B.L. (1988). Word identification in reading proceeds from spelling to sound to meaning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 371-386.

Van Orden, G. C., Pennington, B.F., & Stone, G.O. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychological Review, 97, 488-522.

Authors' Notes

This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada to M. Daneman and E. Reingold. We thank Derek Besner and Elizabeth Bosman for their useful comments and suggestions; Murray Stainton and David Weinstock for their help in data collection. We also thank David Stampe for his indispensable role in developing our eye tracking system.

Footnotes

1 Although see also Pollatsek et al. (1992) who use a homophone preview/priming paradigm to argue for the early involvement of phonological codes in word identification and lexical access. The Pollatsek et al. (1992) paradigm and results will be discussed later.

2 There were homophone errors such as to and for with frequency counts as high as 26,190 and 9,495 occurrences per million respectively, versus homophone errors such as browse and whined with frequency counts as low as 1 per million or less than 1 per million (i.e., did not appear in the Kucera & Francis, 1967, norms).

3 The mean Kucera and Francis (1967) frequency count for the orthographic controls was 57 occurrences per million (median = 36, SD = 77).

4 Note that gaze duration (also referred to in the literature as "first pass reading time") would include the duration of more than one fixation if the reader refixated the target word before leaving it. We believe that gaze duration is a better index of semantic inconsistency detection than is the duration of only the first fixation on the target word, because gaze duration is more likely to capture the later stages of processing (Inhoff, 1984) that are involved in detecting a semantic inconsistency between the target word and the prior text. Nevertheless, like Daneman and Reingold (1993), we also analyzed the duration of the first fixation on the target word. As in Daneman and Reingold (1993), we found that the first fixation data showed the same pattern as the gaze duration data in Experiments 1a and 1b, but the effects were weaker.

5 It is not surprising that the familiarization manipulation used here (and in Daneman and Reingold, 1993) was weaker than the familiarization manipulation used in Daneman and Stainton's (1991) paper-and-pencil proofreading task. The manipulation used here was a much weaker manipulation because the familiarization experience and the proofreading experience differed from one another in many more ways. Remember that in Daneman and Stainton's paper-and-pencil proofreading task (1993), the familiarization text and proofread text were physically identical except for the absence versus presence of error words; both were printed on regular paper, double-spaced, with identical font, identical line breaks, pagination, etc. In our eye movement studies, the familiarization text was presented on five pieces of paper in conventional double-spaced format, whereas the proofread text was presented on twenty successive computer screens, with different line breaks, font, etc; any of these differences in task and physical appearance could have wiped out any benefit of familiarity.

6 Note that the data for the contextually correct control words could not be included in this analysis because there were very few items for which subjects made an error detection response. And even for the homophone error and nonhomophone error data, some subjects and items had to be excluded from the analysis because there was missing data from at least one cell in the four-cell matrix (homophone--button press; homophone--no button press; nonhomophone--button press; nonhomophone--no button press). The analysis reported here included 27 of the 30 subjects and 42 of the 48 target words.

7 Unlike the Daneman and Reingold (1993) interaction which showed that the homophone facilitation effect was restricted to the same-length homophone pairs, the interaction here was simply attributable to the fact that length similarity influenced processing time only for the contextually inconsistent words, not for the contextually consistent ones. In other words, subjects spent less time repairing same-length homophone errors than different-length homophone errors, subject t(29) = 3.13, p < .01 and item t(46) = 2.43, p < .05; they spent more time repairing same-length nonhomophone errors than different-length ones, subject t(29) = 4.54, p < .01, item t(46) = 2.59, p < .05; however, there was no difference in processing time for same-length versus different-length contextually correct homophone controls, subject t(29) = 0.26, p > .75, and item t(46) = 0.06, p > .95.

8 Because readers made very few overt error detection responses for correct words (M =.02), the 636 ms provides a good approximation of processing time for correct phrases when no overt response is made.



Appendix A

Appendix B