Discretion in the Translation of Reading Research to Policy
Barbara M. Taylor, University of Minnesota
Richard C. Anderson, University of Illinois
Kathryn H. Au, University of Hawaii
Taffy E. Raphael, Oakland University
T he history of beginning reading instruction in the United States has been characterized as a "great debate" pitting code-emphasis approaches against meaning-emphasis approaches (e.g., Bond & Dykstra, 1967; Chall, 1967). In the past, research on beginning reading was read primarily by reading educators and researchers, who were also the chief participants in the "great debate." Today, with their interest in reading research, policymakers and the public have become centrally involved in the debate over instructional methods (Pearson, 1997). In her April 29, 1998, column in Education Week, Kathleen Kennedy Manzo illustrates this trend:
State lawmakers around the country, citing poor reading scores and what they see as the failure of schools to find a sure formula for improving literacy, have decided to take on the task themselves. As a result, educators from New York to California have been faced with increasingly prescriptive mandates designed to change the way children are taught to read. (Manzo, 1998b, p. 24)
Over the past three years, articles appearing in Education Week (e.g., Diegmueller, 1996; Manzo, 1997, 1998a, 1998b), as well as in the popular press (e.g., Markley, 1998; Howatt, 1998) have cited research funded by the National Institute of Child Health and Human Development (NICHD) as providing strong support for an emphasis on explicit phonics instruction and the reading of decodable texts. One study in particular, by Foorman, Francis, Fletcher, Schatschneider, and Mehta (1998), has received an extraordinary amount of attention in these articles. Its public representation reveals this characterization:
The study found that explicit, systematic phonics instruction led to higher word-recognition skills among poor 1st and 2nd graders than methods that teach phonics less directly. The study has been held up by legislators around the country, who have called for schools to get back to the basics. They say the findings are proof that phonics is the best way to teach children to read. (Manzo, 1998c)
Public dissemination of a study prior to peer review and acceptance for publication is problematic since the peer review process helps to identify flaws in the research that may cast doubt on the researchers' conclusions. Yet a preliminary version of the Foorman study was described in Education Week in March, 1997 (Manzo, 1997), six months before its September 1997 acceptance for publication. Further, in another article in Education Week , Lyon (1997) discussed findings from the study that were apparently called into question since they did not appear in the published version of the study. Foorman herself presented the study at a meeting of the Education Committee of the California Assembly in May, 1996 (Foorman, 1996), just prior to the passage of a California law requiring professional development for K-3 teachers to focus on systematic phonics instruction. Thus, attention to the Foorman et al. (1998) study began long before it was accepted by a refereed journal, and in some cases findings were highlighted that were not deemed sound enough for ultimate publication.
Once the study became available in its final form, it continued to draw attention. For example, eight pages in the influential report of the National Research Council (Snow, Burns, & Griffin, 1998, pp. 198-206) were devoted to a detailed--but noncritical--summary of the Foorman et al. study, used to support explicit teaching of the alphabetic principle. In part because of its unusual media coverage, the study is influencing national and local policy decisions about reading instruction, teacher preparation, and even the curriculum materials and staff development on which state money may be spent, as the "back to basics" movement gathers momentum (Manzo, 1997, 1998b).
The study by Foorman et al. represents a departure from many other NICHD studies in that it examined the effectiveness of methods of reading instruction used by teachers within schools as part of their ongoing classroom instruction, rather than during separate, "pull-out" programs. Specifically, it compared first and second graders' learning in three conditions: direct instruction in letter-sound correspondences with decodable text (Direct Code), less direct instruction in spelling-sound patterns embedded in connected text (Embedded Code), and implicit instruction in letter-sound correspondences during the reading of connected text (Implicit Code). There were two variations in the implicit code condition: a standard group that received the district's regular curriculum, and what the researchers called a "research group" that received instruction developed specifically for this study. The study purports to demonstrate the superiority of direct code instruction.
We have adopted the approach of taking this research study, published in one of the most prestigious educational research journals, as our case -- our "text," if you will. Our purpose is to analyze this case and the responses to it by the press, the professional community, and the general public, to see what we can learn about potential uses and misuses of research when translated into policy and educational practice (see also Taylor, 1998). While we believe that research can and should inform decisions about policy and practice, we believe that the literacy research cited to justify policy should reflect a broad understanding of literacy, meet high standards of quality, and have the potential to improve student achievement. Our analysis suggests that the Foorman et al. study does not meet these criteria, although it has been cited as prime evidence to support policy decisions (Manzo, 1997, 1998b, 1998c). This case leads us to question the wisdom of basing policy on a single study, or even on a single line of research.
We analyze the research article in terms of four assumptions that appear to underlie the thinking of Foorman and her colleagues, thereby affecting their design and interpretation of findings. First, the text created by these researchers equates word-level processing with reading. Second, it equates deprivation with difference. Third, the text equates instructional method with teaching. Fourth, it equates training with professional development. These assumptions provide the lens through which the researchers interpret their findings and, we believe, blind them to plausible alternative interpretations that policymakers and practitioners need to understand. The stance one takes with regard to these four assumptions is pivotal in policy and practice in literacy education. We end by considering the implications of our analysis for policy.
Certainly, the history of reading instruction has been one of controversy. One of the major reasons why controversy exists is the complexity of reading. As pointed out in the nationally commissioned report, Preventing Reading Difficulties in Young Children (Snow et al., 1998), "reading is inextricably embedded in educational, social, historical, cultural, and biological realities" (p. 33). Teaching children to read involves more than imparting knowledge about letters and sounds. As a language process, reading draws on all the areas depicted in Figure 1.
To focus on beginning reading as a word-level decoding process is to ignore the larger system that produces widespread failure in learning to read in low-income children. In fact, such a limited view may lead to unwarranted optimism about solutions to the problems low-income children face in learning to read.
Thus, our first step in analyzing the Foorman et al. study, in terms of its appropriateness for informing decisions about policy and practice, is to look at the definitions of reading and reading instruction guiding these researchers and at how these definitions relate to literacy and literacy instruction. The text provided by Foorman and her colleagues focuses unduly on instruction in word-level processing as the key to successful beginning reading. Following Gough and Hillinger (1980), the text refers to reading as an "unnatural act" and states that:
... in urban settings, there are entire schools in which reading failure is the norm, in part because of lack of home preparation in understanding the alphabetic principle ... and also because of inadequate instruction in the classroom ... (p. 37)
We believe that the contribution of word-level knowledge can be best understood and appreciated when embedded within a conception of the literacy curriculum that is based on research from many domains (see Figure 1). In this conception, the literacy curriculum should address students' ownership of literacy or motivation to read--aspects that are necessary for the sustained effort needed to become a proficient reader (Guthrie et al., 1996). It should attend to the strategies and skills for constructing meaning, through instruction in reading comprehension and the writing process (Dole, Duffy, Roehler, & Pearson, 1991; Dyson & Freedman, 1991; Hansen, 1998; Many, Fyfe, Lewis, & Mitchell, 1996; Purcell-Gates, 1998a). Given that reading involves many types of literature, the curriculum should help students understand the forms and functions of different kinds of text (Macken-Horarik, 1997). Further, a good literacy curriculum teaches students the conventions of language, from how speech sounds relate to printed symbols (Adams, 1990) to knowing conventions for engaging in discussions about literature (Raphael & Goatley, 1997), mathematics (Ball, 1993), or science (Lemke, 1990; Wells & Chang-Wells, 1997).
Such a literacy curriculum exists within a broader social context that can support or impede children's learning to read and write. For example, consider the relationship between the literacy curriculum and the family and community, one of the circles shown in Figure 1. There may be a match or mismatch between teachers' ways of organizing classroom discussions of text and children's ways of speaking at home (Au & Mason, 1983). Children's home language may be built upon or ignored by the school (Moll & Diaz, 1987). Children have family literacy experiences that may fit well with school expectations or go unrecognized (Gadsden, 1998; Taylor & Dorsey-Gaines, 1988). We have much to do as a society to guarantee strong school-home-community connections for all beginning readers, not just for those of mainstream backgrounds. Similar cases can be made for the importance of the school context (Allington & Walmsley, 1995; Taylor & Pearson, 1999), teachers' abilities and conceptions of classroom teaching (Hoffman, 1991; Snow et al., 1998; Wharton-MacDonald, Pressley, & Hampston, 1998), and the school curriculum (Goodlad & Su, 1992; Shulman & Quinlan, 1996; Wixson, Peters, & Potter, 1996). All these aspects of social context influence students' reading achievement.
Research from many traditions has explored the features of effective literacy instruction (see reviews by Au, 1998; Snow et al., 1998). After reviewing research on effective prevention and literacy instruction delivered in preschool, kindergarten, and primary grades, as well as organizational factors at the classroom, school, and district levels, Snow et al. (1998, pp. 314) conclude that "effective instruction includes artful teaching that transcends--and often makes up for--the constraints and limitations of specific instructional programs." Au's (1998) review of research focuses on literacy learning of students of diverse backgrounds. She concludes that if students are to achieve at higher levels, educators must emphasize ownership; push for biliteracy rather than using the home language only as a vehicle for English literacy; have students read multicultural literature; and teach skills explicitly, within the context of authentic literacy activities. Similarly, scholars at the Center for the Improvement of Early Reading Achievement (CIERA) offer 10 principles for effective reading instruction. With respect to struggling readers, CIERA emphasizes the importance of "well-balanced instructional programs that benefit all children who are learning to read and write." Such programs generally are characterized by intensive one-on-one or small-group instruction, a blend of meaning and code approaches, thoroughly individualized diagnosis, and extensive experiences with an array of texts (CIERA, 1998).
Together, these descriptions of appropriate early reading instruction for all youngsters, including those with reading problems, emphasize a broad curriculum using meaningful and varied texts for purposes that mirror reading and writing in the real world. Systematic instruction in language conventions to instill knowledge of letter-sound correspondences and application of this knowledge to print is emphasized, but within the full literacy curriculum displayed in Figure 1.
In contrast, the text of the Foorman et al. research holds to the idea that explicitly teaching the alphabetic principle can prevent reading failure. Other potentially important factors in literacy development are either ignored or downplayed. In our view, Foorman and her colleagues have limited their attention to too small a part of literacy learning, just the part indicated by "AP" (alphabetic principle) within the language conventions area displayed in Figure 1. While we certainly agree that understanding the alphabetic principle is important for successful reading, it is but one of many factors that need to be addressed to improve the reading achievement of young children in urban schools.
The outcome measures used in the study, and the ways in which they are interpreted, reflect the researchers' apparent assumptions. While data were collected using a range of measures, the analyses highlight only measures that involve words in isolation, letters, and sounds. Four measures were administered four times during the year to estimate what the authors referred to as growth in reading. The Peabody Picture Vocabulary Test (Dunn & Dunn, 1981) assessed growth in receptive vocabulary. A list of 50 words to be read aloud assessed changes in what the authors refer to as "reading skills," though the task would be more accurately described as a measure of word pronunciation. The synthesis and analysis tests in the Torgeson-Wagner (Wagner, Torgeson, & Rashotte, 1994) battery assessed phonological processing. Tasks in the synthesis test included blending phonemes in words and nonwords, while tasks in the analysis test included the segmentation of spoken words into phonemes.
Six reading performance measures were administered at the end of the school year, in addition to the four administered throughout the year. The Wechsler Intelligence Scale for Children-Revised (WISC-R, Wechsler, 1974) was used, but results by instructional condition and grade are not reported. The letter-word identification and word attack subtests of the Woodcock-Johnson (Woodcock & Johnson, 1989) battery assessed word-level processing. The word identification subtest consists of a list of real words, presented in isolation, while the word attack subtest consists of a list of nonwords. The Woodcock-Johnson passage comprehension subtest uses a modified clause procedure in which students are asked to read sentences and provide the missing words. The Formal Reading Inventory (Wiederholt, 1986) assessed narrative and expository text comprehension. The spelling dictation subtest of the Kaufman Test of Educational Achievement assessed sound/symbol knowledge (Kaufman & Kaufman, 1985).
In the spring, students' self-esteem and attitudes toward reading were assessed. A pictorial version of Harter's Perceived Competence Scale (Harter, 1982; Harter & Pike, 1984) assessed self-esteem. Attitude toward reading was measured, with questions asked of the children about their enjoyment of reading and their participation in a range of literacy experiences. Teachers rated each child using the Multigrade Inventory for Teachers (Agronin, Holahan, Shaywitz, & Shaywitz, 1992) which has scales in six areas: academic, activity, language, dexterity, behavior, and attention. Further, teachers completed an evaluation recording special services received by the child, grades, absences, tardiness, results of hearing and visual screenings, and behavioral or family problems.
On the surface, this variety of measures suggests an adequate representation of reading and the potential to look at students' progress on multiple levels. However, a closer examination shows that important aspects of reading were not assessed at all, or not assessed well. Thus, the set of measures lacks construct validity (Messick, 1989), falling prey to the threat of construct underrepresentation. No measure involved reading connected text from books likely to be found in primary-grade classrooms, although respected outcome measures are available [e.g., words correct per minute (Deno, 1985), running records (Clay, 1993a), miscue analysis (Goodman & Goodman, 1994), informal reading inventories (Leslie & Caldwell, 1995), story retellings and rewritings (Morrow, 1985, 1992), and probed recall comprehension test (Morrow & Smith, 1988; Morrow, 1992)]. Further, the multiple-choice measure of comprehension--the Formal Reading Inventory--was determined by the authors to be too difficult and was not administered to 21% of the children. The Woodcock-Johnson passage comprehension test requires inserting words into blanks in sentences. No measure required writing samples, commonly used to show students' knowledge of letter-sound relationships as well as other understandings of print (Clay, 1993a).
The 50-item word list "reading" measure employed by Foorman and her associates was exceedingly difficult. At the end of the year, the first-grade groups averaged only from 4% to 25% correct on this test. A look at the test items reveals no words like cat, dog, hit, or run. In fact, there are no words in the test with fewer than four letters. Most of the words on the test would be rare in first- or second-grade reading material, and some might not even be in the oral language vocabulary of low-performing children. We wonder whether the test provided a fair assessment of the word reading skills of urban Title I first graders. In particular, the difficulty and uniqueness of the words calls into question Foorman et al.'s comparisons of the percentage of children who failed to pronounce even one or two items correctly (pp. 45-46). Exploiting these comparisons in their zeal to prove their theory, Foorman and her colleagues have the effect, no doubt unintended, of shaming poor children and their teachers. They create the unfair and ultimately dangerous impression, whatever their intentions, that reading instruction in a number of urban public classrooms is worthless.
The text states that 46% of the Implicit Code and 44% of the Embedded Code students "exhibited no demonstrable growth in word reading compared with only 16% in the DC (Direct Code) group" (p. 51). However, this statement is based on students' pronouncing the items on the extremely difficult 50-item list of isolated words, not reading words in connected text, a substantially different task, or even reading a list of high-frequency words, which children might have encountered in their instruction. A wider range of measures, especially ones that assessed typical reading, would have been appropriate and more compelling, considering the claims the authors hoped to make about the relationship between students' reading performance and instructional approach.
Recently, Messick (1994) and others have stressed the importance of the consequential validity of tests--that is, the intended and unintended consequences of giving tests and interpreting the scores. In our judgment, investigators must be especially cautious in any instance in which test results and interpretation might contribute to narrowing the curriculum for students of diverse backgrounds. Foorman et al. used results from a test battery weighted towards phonological processing and word decoding, but interpreted the results to support direct phonics instruction to solve a broad array of reading problems. Students of diverse backgrounds already typically receive large doses of instruction in isolated, lower level skills with little opportunity to engage in higher level thinking about text (Allington, 1991; Darling-Hammond, 1995). The Foorman study may have the lamentable consequence of leading to even more skill and drill and even less thought-provoking experience with meaningful text for children in poor urban schools.
Foorman and her colleagues' text reflects a deprivation view of reading difficulties when attributing reading failure in entire schools within urban districts not only to inadequate classroom instruction but to "lack of home preparation in understanding the alphabetic principle" (p. 37). Their argument relies on the cultural deprivation paradigm to explain, in part, the poor reading achievement of children from low-income families. According to Banks (1995), those who adopt this stance believe that schools must help low-income students surmount the deficits that arise from their home experiences. Banks points out two problems with this stance, both evident in the study by Foorman and her colleagues. First, this stance reduces attention to the fundamental structural changes needed in schools if they are to meet the twin goals of equity and excellence in education. Second, this stance ignores students' strengths and focuses instead on their weaknesses.
Consistent with a deprivation stance, the text of the Foorman et al. report does not acknowledge that structural conditions in schools may create conditions of discrimination or exacerbate conditions of discrimination within the larger society. The authors describe the sample in their study as 60% African-American, 20% Hispanic, and 20% white, while noting that the ethnic composition of students in the school district is 20% Asian, 26% African-American, 23% Hispanic, and 31% white. As in many urban districts, African-American children are overrepresented in the group qualifying for Title I services, yet the text provides no comment upon this situation. By latching onto instruction in the alphabetic principle as the solution to improving reading achievement, the researchers fail to challenge the bias in the larger system that regularly places a disproportionate number of African-American children in remedial reading programs and deprive themselves of a rich body of literature that might have informed the design of their instructional programs.
Researchers operating from the cultural difference, rather than deprivation, paradigm have documented the variety of home literacy experiences of students of diverse backgrounds. Notable studies include those by Teale (1986) and Taylor and Dorsey-Gaines (1988). The low-income Hawaiian parents studied by Levin (1992) believed that it was important for their preschool children to know the letters of the alphabet, and they attempted to teach these skills directly. Purcell-Gates (1998b) found that low-income parents began or increased their involvement in their children's literacy learning when formal literacy instruction began in school. These studies suggest that some children's problems with literacy learning in school may result less from a lack of relevant home experience with literacy and more from schools' inability to value and build upon the strengths in literacy children may already have developed. The Foorman et al. text shows little attention to or respect for the knowledge of language and literacy the children in this study may have brought from home or awareness of the possibilities of building upon this knowledge.
The review of literature in the Foorman et al. text further underscores the degree to which differences due to factors such as ethnicity, primary language, and social class are downplayed or ignored. The possible influence of these factors goes unacknowledged when research supporting the thesis of the study is discussed. For example, the text cites Francis, Shaywitz, Stuebing, Shaywitz, and Fletcher (1996), work in which 84% of the subjects were white, 11% African-American, and 5% of other races--a sample very different from that in the Foorman et al. study. The text cites Fletcher et al.'s (1994) work in which subjects were identified as learning disabled, although none of the students in the Foorman et al. study were so classified. In the study by Stanovich and Siegel (1994), the subjects were largely middle class, fewer than 2% were nonwhite, and all spoke English as their primary language. In the study by Wagner, Torgeson, and Rashotte (1994), 75% of the children were white, while 25% were African-American; no information is given about the socioeconomic status of the students. Vellutino et al. (1996) worked with kindergarten children in six middle- to upper-middle-class school districts. The research by Byrne and Fielding-Barnsley (1995) was conducted in Australia. In short, the text attempts to make the case for a direct connection between these studies and students in urban schools in the United States, apparently on the assumption that differences in student populations with respect to socioeconomic status, ethnicity, achievement, and language play little or no role.
As with differences in other characteristics, differences due to poverty are largely ignored in the Foorman et al. text, despite well-documented evidence that children from schools with higher levels of poverty do more poorly in reading than children from schools with lower levels (Jerald & Curran, 1998; Puma, Jones, Rock, & Fernandez, 1993). "School poverty depresses the scores of all students in schools where at least half of the students are eligible for subsidized lunch, and seriously depresses the scores when more than 75% of students live in low-income households" (Puma et al., 1997, p. 12). Puma et al. concluded that the poverty level of the school a student attends may be as influential in determining achievement as the student's family poverty level. Three fourths of the classrooms in the Direct Code condition were in schools that had 43%, 42%, or 40% poverty, while only one fourth were in schools at the 65% poverty level. The distribution was reversed in the Embedded Code condition: three fourths were in schools with 64% poverty and one fourth in schools with 32% poverty. In the Implicit Code condition, two thirds were at 50% poverty, while the rest were distributed fairly evenly at 64%, 40%, and 32% poverty levels. This imbalance is, at the very least, a confounding factor. Yet the researchers convey, through the design of the study and assignment of treatment groups, that differences due to poverty are not as important as those due to instructional treatment.
Fewer of the students who received the Direct Code intervention came from poor homes than the students in the other two conditions (Foorman, Francis, et al, 1998). It is not surprising, then, to find that the initial phonological processing scores (Table 3, p. 43) and word pronunciation scores (Table 4, p. 44) of the Direct Code group are higher than those of any other group. (It is possible that some benefit from Direct Code instruction was already showing itself since the initial assessment took place in October after the interventions were underway, but this possibility is not suggested in the Foorman et al. text.) Using the figures in Table 3, we calculate that the Direct Code first graders had significantly higher initial phonological processing scores than the Embedded Code first graders (t=3.21, p<.01). This is noteworthy because the advantage of the Direct Code intervention appears only with first graders, and the only serious rival to the Direct Code intervention in this study was the Embedded Code intervention. Foorman et al. employed sophisticated statistical techniques, but there is no statistical remedy for a prior difference among groups in a psycholinguistic ability demonstrably important for learning to read.
The Foorman et al. text appears to assume that instructional methods equal teaching. But programs don't teach, teachers do. A teacher's understanding of and commitment to an instructional strategy are critical. Classroom research generally shows that teachers make a larger difference in students' growth as readers than the methods those teachers are nominally using (Bond & Dykstra, 1967; Hoffman, 1991). Between-teacher variation has usually proven to be greater than between-method variation after taking account of variation in initial student characteristics. Thus, teacher professional development, stressed as an important key to improving students' reading achievement (Lyon, 1998; Snow et al., 1998), may be more realistic than finding the "right" method. Similarly, strong school effects on student attainment in reading (Puma et al., 1997; Stringfield, Millsap, & Herman, 1997) suggest the importance of professional development for all school staff members, including administrators and support personnel outside the classroom, not simply classroom teachers.
Possibly because the instructional method nominally being used is equated with teaching, Foorman et al. may not have thought it important to detail what was actually taught in the full language arts curriculum. In the study, students differed in the 30 minutes of daily instruction they received: emphasizing direct instruction in the alphabetic principle (Direct Code), spelling patterns in predictable books (Embedded Code), and what is described as a "whole language" philosophy of teacher as facilitator but not direct instructor (Implicit Code). However, the remaining 60 minutes of the reading and language arts program was not described across the different classrooms.
Moreover, in addition to the 30-minute instructional variation related to the alphabetic principle, students in each of the treatment conditions received 30 minutes of instruction within a "tutorial" that was not described. Title I teachers taught the students in the tutorial. At times, the same teachers taught students from each of the treatment conditions. Further, depending on the needs of the students, one-on-one instruction was given at some times and small-group instruction at others. Thus, little information is provided on 75% of the students' reading program, portions of which were taught by the same teacher across treatments. The researchers seemed to assume that descriptors such as "a literature-rich environment" (p. 39), "print-rich environment" (p. 40), "district's standard tutorial based on Clay's (1993b) method," or even "Direct Code" and "Implicit Code" were specific enough to convey to practitioners what students within each treatment group experienced.
An appendix that lists, but does not describe, eight components in each condition raises more questions than it answers. For example, the Direct Code condition included "writing," the Embedded Code condition included "writing (shared, independent)," and the Implicit Code condition included "writing workshop, process." From the descriptions, we cannot know what occurred in each condition. We do not know what percentage of the 30 minutes in the Direct Code condition was devoted to phonemic awareness and phonics instruction versus use of the anthology and guided and independent exploration. We do not know how much time in the Embedded Code condition was spent working with word patterns (i.e., decoding) versus reading and making sense of connected text. Yet we are essentially asked to believe that the significant differences in the study are attributable to the decoding instruction: "Children who were directly instructed in the alphabetic principle improved in word-reading skill at a significantly faster rate than children indirectly instructed in the alphabetic principle through exposure to literature" (Foorman, Francis, et al., 1998, p. 51).
Decades of research have documented that systematic word recognition instruction is necessary to teach many children to read (Bond & Dykstra, 1967; and see reviews by Adams, 1990; Anderson, Hiebert, Scott, & Wilkinson, 1985; Chall, 1967). However, direct instruction in the alphabetic principle is only part of what happened in the experimental conditions in the Foorman et al. study. We cannot conclude from this study that "Results show advantages for reading instructional programs that emphasize explicit instruction in the alphabetic principle for at-risk children" (Foorman, Francis, et al., 1998, abstract, p. 37). More appropriately, we could say that results show a small advantage for at-risk children on restricted measures for a reading instructional program that, along with an emphasis on explicit instruction in the alphabetic principle, also included reading practice, literature, writing, spelling, and one-on-one or small group Title I instruction.
In summary, there are several points to be made about the failure of the text to provide a good account of classroom instruction. Quite possibly, features of instruction, in addition to the approach to teaching the code, varied across the four conditions and contributed to the observed effects. For instance, the program used in the Direct Code condition is known for having challenging reading selections at the end of the first grade. These selections may have included more of the difficult words on the word reading test or more opportunity to practice decoding difficult words. At the very least, the supposedly constant 75% of the program is necessary, although apparently not sufficient, for strong growth in phonological awareness and word pronunciation. Certainly, the total reading and language arts program contributed to learning to read connected text, write, speak, comprehend oral language, and appreciate literature, to name a few of the facets of literacy that went unmeasured in the Foorman et al. study.
The Foorman et al. text equates "training" and "retraining" with professional development. The text mentions the initial "training" sessions and teachers' "delivery" of a particular method. It speaks of teachers' "compliance" with the conditions for the different classroom reading programs. It states that attempts to "retrain" four teachers who showed "0% compliance" were met with "repeated resistance" (p. 42). This language shows little respect for teachers, little understanding of sound approaches to professional development (Richardson & Placier, in press), little understanding of the complexities of classroom settings and instruction, and little appreciation of teachers' ability to adjust and adapt classroom programs to meet the needs of particular groups of children.
If teachers' authority and professionalism are not respected, their sense of personal agency is undermined, and they may teach in narrow, formulaic ways (Richardson, 1998). This means that, rather than attending to the individual differences and needs of their students, they are more likely to focus on fidelity to a particular program, whether a commercially prepared basal reading program or a researcher-created program for teaching some subset of the curriculum. Moreover, powerless teachers may breed powerless students, and children of poverty are already at risk in this respect. If children are to become strong, independent learners and problem-solvers, schools must be places that honor the personal agency of all participants.
Other research points to the promise of a collaborative, problem-solving approach to the professional development of teachers (Richardson & Placier, in press) and of a dual model, one that links ongoing support for teachers' professional development to the improvement of literacy achievement (Au & Carroll, 1997). There is no reason to believe that compliance with a particular instructional method enables teachers to address the varying needs of diverse students.
The four assumptions that we detail above provide insight into how researchers' implicit and explicit beliefs serve both to focus and constrain data interpretation. Among the measures included in the study, only the results pertaining to measures of letter-sound analysis and word pronunciation are discussed at any length. In fact, the investigators failed to find significant differences on more than half their measures, including all of those addressing reading beyond the letter and word level. The authors began with ten measures. When they aggregated scores from the two Torgeson-Wagner phonological processing tests, they were left with nine. They also combined the Woodcock-Johnson letter-word identification and word attack (pseudoword) tests into the Basic Reading cluster, leaving eight measures. The results for the WISC-R are not reported, leaving seven. Of these, no significant differences were found on four measures: the Peabody Picture Vocabulary Test, Woodcock-Johnson passage comprehension, Formal Reading Inventory, and Kaufman spelling dictation test. In the presentation and discussion of results, the text fails to acknowledge the narrow and highly specific nature of the findings. It fails to break out of the closed circle of decoding instruction and decoding tests.
Results on any but the decoding tests are downplayed. Children in the Implicit Code-Research group were found to have significantly more positive attitudes toward reading than those in the Direct Code and Implicit Code-Standard groups. Researchers such as Guthrie et al. (1996) have demonstrated the importance of student engagement for developing successful readers. Children in the Implicit Code-Standard group received teacher ratings indicating significantly more behavioral and academic problems than children in the other groups. The presence of more children with problems obviously impacts the chances for learning to read of all of the students in these classrooms.
Below we provide a different analysis (decidedly our own) of the Foorman et al. findings, accompanied by a different set of conclusions. First, Foorman et al. downplay an important difference among students--grade level. They grouped nonreaders from both first and second grade together, despite conspicuous differences in the ability of first and second graders to pronounce words on a baseline measure. An inspection of the word reading scores in October, shown in Table 4 of the article, reveals that the first- and second-grade students were very different from one another. In the three experimental conditions (Direct Code, Embedded Code, and Implicit Code-Research), the first graders had .20, .18, and .07 words correct, whereas the second graders had 5.73, 4.75, and 5.12 correct.
As we look at second grade, inspecting the means in Table 3 in the Foorman study, it is apparent that there were only slight differences among second graders in the Direct Code, Embedded Code, and Implicit Code-Research groups in October to April gains in phonological processing score. Likewise, it is clear from the means in Table 4 (Foorman, Francis, et al., 1998) that there were negligible differences among second graders in the Direct Code, Embedded Code, and Implicit Code-Research groups in October to April gains in word pronunciation. To the extent that there are reliable differences among instructional groups on these two measures, it follows that the differences must be attributable to first graders rather than second graders. However, Foorman et al. provide only analyses pooling first and second graders, in which age (meaning essentially grade) was a covariate. Thus, in a roundabout way, because of the covariance adjustment, the Direct Code condition looks relatively better in these pooled analyses because a smaller percentage of second graders were assigned to this condition (25%) than to the other conditions (42%, 33%, and 50%). The most relevant analysis, a conventional test for a grade-by-treatment interaction, was not conducted.
Why were there fewer second graders in the Direct Code group than in the other three groups? The answer seems to be that fewer second graders in Direct Code classrooms qualified for Title I and Foorman et al. wanted to limit participation to students who had access to Title I services. The problem is that matching on one factor, such as whether a child qualifies for Title I services, does not succeed in equating unequal groups (Cook, Campbell, & Peracchio, 1990). The assumption underlying the procedure used by Foorman and her colleagues is that students in the less privileged school would regress toward the mean of students in the more privileged school with whom they had been matched. However, both theory and experience teach us that they will regress toward the mean of the poor children in their own school (Cook et al., 1990).
The fact that there was no difference among instructional conditions with second graders implies that lack of understanding of the alphabetic principle is not the problem, or at least not the chief problem, for low-performing second graders. It suggests that intensive phonics instruction to produce a fast start in word decoding has limits. It enables the inference that other aspects of the reading and language arts program are as important as phonics instruction for poor readers at this stage.
Turning now to first graders, Tables 3 and 4 in Foorman et al. on phonological processing and reading words from a 50-item list, respectively, show differences favoring the Direct Code condition, which are no doubt significant. The problem is that Direct Code first graders had higher initial phonological processing scores than first graders who received the other interventions--significantly higher when contrasted with Embedded Code first graders. In fact, Foorman et al. report that after adjusting for initial differences in phonological processing score, the contrast between Direct Code and Embedded Code on improvement in reading words on the word list is no longer significant, and the contrast between Direct Code and Implicit Code-Research is truncated, although still significant.
On the end-of-year achievement tests, the Direct Code group performed significantly better than the other groups on the composite measure of letter-word identification and pseudoword decoding from the Woodcock-Johnson battery. The Direct Code group also performed somewhat better than the Embedded Code group on the Woodcock-Johnson passage comprehension measure, although the difference was not significant when a conservative criterion was employed. Notably, end-of-year scores were NOT adjusted for initial phonological processing score. If this had been done, the advantage of the Direct Code group might have disappeared because of its initial advantage in phonological processing.
We conclude that, even when one considers the final set of word-oriented measures, the Foorman et al. results provide little credible evidence for the superiority of the Direct Code method. First, the range of evidence is limited. The Direct Code students appeared to learn what they were taught somewhat better than the students in the other groups, but this advantage did not appear to transfer to everyday reading and writing tasks. Second, a fundamental problem is that the groups that received the four methods were not initially equivalent. The Direct Code students came from schools with lower poverty levels on average and had higher October phonological processing scores. When performance on later tests is statistically adjusted for differences in initial phonological processing, the advantage of the Direct Code group shrinks and in many cases is no longer significant. Since initial phonological processing score is just one indicator of an array of initial differences between groups, it is not safe to conclude that, whatever the initial level of students, Direct Code instruction was responsible for steeper "growth curves." The varying trajectories of the students may have been impelled by home factors and/or other school factors. Differences among groups of students can neither be removed using a statistical adjustment for phonological processing nor neutralized by matching students on qualification for Title I services. Two matched children will have a different experience depending upon the level of poverty of classmates in the schools they attend (Puma et al., 1997). The one in the school serving more poor children may have classmates who give less attention to academic tasks; may experience a slower pace of instruction; or may have fewer peer models of successful, engaged reading and writing.
We have treated the Foorman et al. text as an example of research that has been overly promoted by the media and misused by some policymakers and educational leaders in the search for answers to improving children's reading achievement. The Foorman et al. study is not, in itself, one on which policy decisions should be based, though it continues to receive an extraordinary amount of attention. Despite serious methodological weaknesses, this study has been widely cited by the press (Foorman, 1996) and is being used across the United States to support the trend toward mandated instruction in phonemic awareness and phonics for beginning readers of all levels and abilities (Manzo, 1997, 1998b). Since 1996, when information about the Foorman et al. study was first released, 67 bills to make phonics the law have been proposed in states around the country (Manzo, 1998b). Legislation or mandates have already been implemented in states including Texas, Washington, California, New York, and Wisconsin (Allington & Woodside-Jiron, 1998; Manzo, 1998b).
Why has this study had such an impact? First, it has had unusually extensive media coverage. Second, we believe that claims made on the basis of this flawed study feed the false hopes of many Americans, including policymakers, educators, and the general public, that we can find a single, simple solution, such as directly teaching phonics, to the real and complex problem of improving the reading of young children in high poverty schools. Policymakers and educators feel the urgency of finding an easy answer and producing results. Foorman et al. appear to present just such an easy answer in the last line of their article, by suggesting that widespread reading failure might be prevented through explicit teaching of the alphabetic principle. Further, when the authors of this widely publicized study use their results as the basis for promoting specific commercial programs such as Open Court and SRA Reading Mastery (Foorman, Fletcher, & Francis, 1998), they contribute to the impression that students' reading problems will be solved if a school simply buys the right program.
Policymakers are under pressure to respond to the public's perception of a crisis in American education. While we agree that improvement in literacy education is needed, especially for children living in poverty, we do not believe there is a crisis. Berliner and Biddle (1995) provide compelling evidence that crises have been "manufactured" to discredit public education and serve various political ends. Levin (1998) points out that, while the schools were blamed for a poorly educated labor force that led to a decline in American economic competitiveness in the late 1970s and early 1980s, they were not given credit for a well-educated labor force and the booming economy of the 1990s.
In the case of literacy, the compulsion to respond to a perceived crisis leads some school districts to adopt what they believe to be "teacher-proof" commercial materials for reading instruction. This is a short-sighted response because it fails to give appropriate weight to the teacher, along with many other elements of the schooling context (e.g., high-quality instruction tailored to meet individual needs, strong home-school relationships, systematic evaluation of pupil progress) in explaining the growth of poor children's reading ability (Taylor & Pearson, 1999). No school, no classroom, no child is exactly like any other. Good teaching of reading, or any other subject, cannot simply be a matter of using the "right" method, because any method may be more or less effective depending on its fit with the school, the classroom, the teacher, and the needs of individual children.
Belief in a crisis lends support to spending considerable sums on studies of small parts of the total reading process, studies that appear to offer easily translated implications for policy and practice. Experiments on word-level processes, such as that conducted by Foorman et al., are one type of reading research, maybe the only type, that already receives ample funding. The annual budget for NICHD reading research is $14 million a year, and since 1983 a total of $104 million has been spent on NICHD reading research (Lyon, 1996). Priority should be given to funding research on equally important but less thoroughly investigated aspects of the reading process, such as comprehension and composition, as well as on the broader context for learning to read, such as teachers and classroom instruction, the school, and the family and community.
Educational researchers have a responsibility to interpret research in terms of its implications for policy. Often, these interpretations are more complex than policymakers might wish, as noted in the articles in the special issue of the Educational Researcher on the relationship of educational research to policy (Good, 1996). One of the roles of researchers in discussions of policy should be to encourage the recognition and understanding of complexities and to forestall a rush to simplistic solutions. At the same time, as Cooper (1996) has cautioned, researchers should be aware of how to formulate their findings, so that policymakers do not view the statement "it depends" as a prescription for inaction.
Literacy research documents an array of practices important for struggling beginning readers (Anderson, 1998; Au, 1998; Hiebert & Taylor, in press; Pikulski, 1994; Snow et al., 1998; Spear-Swerling & Sternberg, 1996). These practices include systematic instruction in word recognition, carefully selected texts, repeated reading, guided writing, regular assessment of pupil progress, extra time in reading, one-on-one tutoring, strong home connections, and on-going staff development. Knapp and associates (1995) found that instruction emphasizing meaning, as opposed to basic skills, in high-poverty classrooms "produces superior learning of advanced skills and comparable or better learning of `basic' skills by both high and low achievers (p. 184)." Although the solutions are not simple, a great deal is known about what it takes to help most children read at average levels by third grade. Educational researchers should assume responsibility for alerting policymakers to the breadth of relevant reading research. As illustrated by the case of Foorman et al., in the haste to find a simple solution to perceived shortcomings in early reading instruction, whole categories of research-based practice may not even be considered in policy discussions.
We live in an age when achievement test results are seen by many as the basis for determining the progress of American students and the strength of the public schools. For this reason, it is essential for educators to help policymakers understand the importance of using forms of evaluation suitable for the assessment of complex processes such as reading. Decisions should hinge on significant outcomes (e.g., comprehension of books) as opposed to trivial ones (e.g., pronunciation of nonwords). If we are satisfied to define reading simply as being able to pronounce words, it is relatively easy to decide whether students in particular programs have learned to read and all too easy to make decisions about students and schools. However, if we want our children to be able to do more than pronounce lists of words, we face a more formidable task. We must base decisions on assessments of students' ability to comprehend, appreciate, interpret, and critically analyze texts, taking into account large variations in their linguistic experience and background knowledge.
The case of Foorman et al. ought to alert the educational research community, including editors and reviewers, to the importance of setting its work within the broader context of the field (for example, in the Foorman et al. study, specifying that word identification rather than text reading served as the primary index of reading) and being very clear about the limitations of each study (for example, that the results are based only on the reading of words and nonwords in isolation, letter-sound knowledge, and phonological awareness). Because of the complexity of the reading process and increased public interest in some reading research, researchers bear the responsibility for explaining how their work fits into the larger picture of literacy education and identifying its contributions to the converging body of credible evidence established in the field. In our opinion, the Foorman et al. text falls short in this regard, particularly in its claim that instruction in the alphabetic principle may well "prevent reading failure for large numbers of children" (p. 52). If we as researchers fail to fulfill the responsibility for showing how our work fits within the larger context, we will continue to see research findings overgeneralized and used to support policies mandating narrowly drawn instructional practices.
Discretion is certainly warranted in promulgating conclusions from a single study or a single line of research. We must be mindful that, particularly in schools in low-income communities, issues of curriculum and instruction must be considered along with factors such as poverty, violence, drug abuse, and deteriorating buildings. Policies likely to be successful in raising literacy throughout the nation will address the wider systems that promote or hinder children's learning to read.
We are indebted to Barbara Foorman and David Francis for patiently answering a number of questions, providing further information about the study, allowing us to examine the word reading test, and describing additional analyses of the data. We also wish to thank Robert Schwartz and James Hoffman for commenting on an earlier version of the manuscript. Correspondence concerning this article should be addressed to Barbara Taylor, College of Education and Human Development, University of Minnesota, 159 Pillsbury Dr. SE, Minneapolis, MN 55455.
This research was supported, in part, by CIERA, the Center for the Improvement of Early Reading Achievement, which in turn, is supported under the Educational Research and Development Centers Program, PR/Award Number R305R70004, as administered by the Office of Educational Research and Improvement, U.S. Department of Education. However, the contents of the described report do not necessarily represent the positions or policies of the National Institute on Student Achievement, Curriculum, and Assessment or the National Institute on Early Childhood development, or the U.S. Department of Education, and you should not assume endorsement by the Federal government.
Agronin, M. E., Holahan, J. M., Shaywitz, B. A., & Shaywitz, S. E. (1992). The multi-grade inventory for teachers. In S. E. Shaywitz & B. A. Shaywitz (Eds.), Ion deficit disorder come of age (pp. 29-67). Austin, TX: PRO-ED.
Allington, R. L. (1991). Children who find learning to read difficult: School responses to diversity. In E. H. Hiebert (Ed.), Literacy for a diverse society: Perspectives, practices, and policies (pp. 237-252). New York: Teachers College Press.
Banks, J. A. (1995). Multicultural education: Historical development, dimensions, and practice. In J. A. Banks & C. A. M. Banks (Eds.), Handbook of research on multicultural education (pp. 3-24). New York: Macmillan.
Byrne, B., & Fielding-Barnsley, R. (1995). Evaluation of a program to teach phonemic awareness to young children: A 2- and 3-year follow-up and a new preschool trial. Journal of Educational Psychology, 87 (3), 488-503.
Cook, T. D., Campbell, D. T., & Peracchio, L. (1990). Quasi experimentation. In M. Dunnette & L. Hough (Eds.), Handbook of industrial and organizational psychology, Vol. 1 (2nd ed.) (pp. 491-576). Palo Alto, CA: Consulting Psychologists Press.
Fletcher, J. M., Shaywitz, S. E., Shankweiler, D. P., Katz, L., Liberman, I. Y., Stuebing, K. K., Francis, D. J., Fowler, A. E., & Shaywitz, B. A. (1994). Cognitive profiles of reading disability: Comparisons of discrepancy and low achievement definitions. Journal of Educational Psychology, 86 (1), 6-23.
Foorman, B. R., Fletcher, J. M, & Francis, D. J. (1998). Preventing reading failure by ensuring effective reading instruction. In S. Patton & M. Holmes (Eds.), The keys to literacy (pp. 29-39). Washington, DC: Council for Basic Education.
Foorman, B. R., Francis, D. J., Fletcher, J. M., Schatschneider, C., & Mehta, P. (1998). The role of instruction in learning to read: Preventing reading failure in at-risk children. Journal of Educational Psychology, 90 (1), 37-55.
Francis, D. J., Shaywitz, S. E., Stuebing, K. K., Shaywitz, B. A., & Fletcher, J. M. (1996). Developmental lag versus deficit models of reading disability: A longitudinal, individual growth curves analysis. Journal of Educational Psychology, 88, 3-17.
Goodman, Y. M., & Goodman, K. S. (1994). To err is human: Learning about language processes by analyzing miscues. In R. B. Ruddell, M. R. Ruddell, & H. Singer (Eds.), Theoretical models and processes of reading (pp. 104-123). Newark, DE: International Reading Association.
Guthrie, J. T., Van Meter, P., McCann, A. D., Wigfield, A., Bennett, L., Poundstone, C. C., Rice, M. E., Faibisch, F. M., Hunt, B., & Mitchell, A. M. (1996). Growth of literacy engagement: Changes in motivations and strategies during concept-oriented reading instruction. Reading Research Quarterly, 31 (3), 306-332.
Hansen, J. (1998). Young writer: The people and purposes that influence their literacy. In J. Osborn & F. Lehr (Eds.), Literacy for all: Issues in teaching and learning (pp. 205-236). New York: Guilford.
Hiebert, E. H., & Taylor, B. M. (in press). Beginning reading instruction: Research on early interventions. In R. Barr, M. Kamil, P. Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research, Volume III. New York: Longman.
Hoffman, J. (1991). Teacher and school effects in learning to read. In R. Barr, M. Kamil, P. Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research, Volume II (pp. 911-950). New York: Longman.
Macken-Horarik, M. (1997). Relativism in the politics of discourse: Response to James Paul Gee. In S. Muuspratt, A. Luke, & P. Freebody (Eds.), Constructing critical literacies (pp. 303-314). Kresskill, NJ: Hampton.
Many, J. E., Fyfe, R., Lewis, G. L., & Mitchell, E. (1996). Traversing the topical landscape: Exploring students' self-directed reading-writing-research processes. Reading Research Quarterly, 31 (1), 12-35.
Morrow, L. M. (1992). The impact of a literature-based program on literacy achievement, use of literature, and attitudes of children from minority backgrounds. Reading Research Quarterly, 27 (3), 251-275.
Purcell-Gates, V. (1998a). Growing successful readers: Homes, communities, and schools. In J. Osborn & F. Lehr (Eds.), Literacy for all: Issues in teaching and learning (pp. 51-72). New York: Guilford.
Raphael, T. E., & Goatley, V. J. (1997). Classrooms as communities: Features of community share. In S. I. McMahon & T.E. Raphael (Eds.), Book Club: Literacy learning and classroom talk (pp. 26-46). New York: Teachers College Press.
Stanovich, K. E., & Siegel, L. (1994). Phenotypic performance profile of children with reading disabilities: A regression-based test of the phonological-core variable-difference model. Journal of Educational Psychology, 86 (1), 24-53.
Stringfield, S., Millsap, M.A., & Herman, R. (1997). Urban and suburban/rural special strategies for educating disadvantaged children: Findings and policy implications of a longitudinal study. Washington, DC: U.S. Department of Education.
Taylor, B. M., & Pearson, P. D. (1999). Beating the odds in teaching all children to read: Lessons from effective schools and effective teachers. Paper presented at the NCREL Regional Reading Summit, Chicago.
Taylor, D. (1998). Beginning to read and the spin doctor of science: The political campaign to change America's mind about how children learn to read. Urbana, IL: National Council of Teachers of English.
Vellutino, F. R., Scanlon, D. M., Sipay, E., Small, S., Pratt, A., Chen, R., & Denckla, M. (1996). Cognitive profiles of difficult-to-remediate and readily remediated poor readers: Early intervention as a vehicle for distinguishing between cognitive and experiential deficits as basic causes of specific reading disability. Journal of Educational Psychology, 88 (4), 601-638.
Wagner, R., Torgeson, J. K., & Rashotte, C. A. (1994). Development of reading-related phonological processing abilities: New evidence of bidirectional causality from a latent variable longitudinal study. Developmental Psychology, 30 (1), 73-87.
Wells, G., & Chang-Wells, G. L. (1997). "What have you learned?": Co-constructing the meaning of time. In J. Flood, S. Heath, & D. Lapp (Eds.), A handbook for literacy educators: Research on teaching the communicative and visual arts (pp. 514-527). New York: Macmillan.
Wharton-MacDonald, R., Pressley, M., & Hampston, J. M. (1998). Literacy instruction in nine first-grade classrooms: Teacher characteristics and student achievement. Elementary School Journal, 99, 101-128.