Decodable Texts for Beginning Reading Instruction: The Year 2000 Basals
James V. Hoffman
Elizabeth U. Patterson
The University of Texas--Austin
O ver the past decade, basal textbooks have become a virtual lightning rod in the "reading wars" (Pikulski, 1997; Strickland, 1995): Should beginning reading instruction be literature-based or skill-based? Should the language in texts be highly literary or highly decodable? Both sides in the debate have resorted to using state textbook adoption policies as an effective leverage point for change (Hoffman, in press). Educators and politicians in Texas and California in particular have played significant roles in pushing early reading instruction from one extreme position to another through shifts in textbook adoption requirements (Farr, Tulley, and Powell, 1987). The textbook policy actions taken in the states of Texas and California are more than just isolated cases and more than a reflection of the national trends. These actions are shaping a national curriculum for reading. Basal publishers target their product development toward these states, and the programs that are marketed successfully in Texas and California are the ones that are most likely to thrive, with minimal changes, in the highly competitive national marketplace.
We have been engaged in a study of the nature and effects of changes in the texts used for beginning reading instruction (Hoffman, McCarthey, Abbott, Christian, Corman, Dressman, et al., 1994; Hoffman, McCarthey, Elliot, Bayles, Price, Ferree, et al., 1998; McCarthey, Hoffman, Elliott, Bayles, Price, Ferree, et al., 1994), and have documented changes in basal reading programs that result from the state mandates in Texas for more literature-based teaching practices and materials. Further, we have described some of the ways in which these changes in the textbooks have influenced instructional practices.
We are continuing to explore the most recent changes in basal texts associated with Year 2000 requirements for reading textbooks in Texas. These changes constitute a dramatic reversal in perspective and priorities from adoptions during the previous two decades. Literature-based teaching principles and practices and the valuing of quality literature have been pushed to the background. In their place, we find a growing emphasis on vocabulary control that is tied to more explicit skills instruction. These events are clearly driven by state policy initiatives. Just as the Texas adoptions of the early 1990's required publishers to use authentic children's literature as the texts for beginning reading instruction, the Year 2000 Texas mandates provided for severe restrictions on vocabulary and explicit skills teaching. This report focuses specifically on the Texas state basal reading adoption for the year 2000 and the impact of these new mandates on program features.
To fully appreciate the magnitude of the changes that have taken place over the past two decades it is necessary to briefly review the history of basals and their use in the United States over the past century. While the "one reading book per grade level" principle can be traced back to the mid-nineteenth century and McGuffy's (1866) readers, the term "basal" was not used to describe commercial programs until the early twentieth century (Hoffman, in press). In its early use the term "basal" was not used as much to identify an "approach" as it was to describe a commercial program, which employed different readers for each grade level. Many of these early series used the term "progressive" in their titles, not to imply a "new approach" to teaching reading, but as a description of the leveled nature of the books in the program. It was the growing consensus surrounding the "look-say" method, popularized in basals in the mid-1950s, that led to the association of basals with a particular approach or philosophy. This consensus was epitomized in Scott Foresman's "Sally, Dick and Jane" readers (Smith, 1965). Repeated practice with the same small set of words was seen as the key to promoting decoding abilities. Vocabulary control was the primary factor used in the leveling of these texts, from the pre-primers through the primers and into the first readers.
In classrooms, basals were dominant through the 1950s and 1960s (Austin & Morrison, 1963). They were not revered in all quarters, however. In fact, traditional, "look-say" basals came under severe attack--both in the public (Flesch,1955) and scholarly press (Chall, 1967). Most of these criticisms focused on the lack of attention to systematic phonics instruction.
Basals changed in the 1970s and 1980s. Helen Popp (1975) described the changes during this period in terms of an increase in vocabulary (a loosening of control) and in the number of skills taught. However, she lamented the mismatch between the skills taught and the words read. Rudolf Flesch (1981) was less generous, describing the changes as superficial, leading the new basals even further off track than their predecessors had been in the 1950s. Flesch's "dirty dozen" (i.e., the dominant and most popular basals) continued to avoid explicit skills instruction and relied too heavily on sight word teaching. Flesch argued for the "fabulous four" (i.e, phonic linguistic programs, such as Lippincott's) that provided for explicit skills instruction, with practice in materials that required the reader to apply the skills taught.
By the mid-1980s, no group seemed willing to defend the status quo in basals; basal "bashing" was on the rise (e.g., Shannon, 1987). Basals were attacked from the "code emphasis" side as being unsystematic (Beck, 1981), and from the "meaning emphasis" side as trivial and boring (e.g, Goodman & Shannon, 1988). Adding fuel to the fire of criticism, national assessments continued to point out the failure of schools to meet the literacy needs of all learners--in particular, the failure of schools to meet the needs of minority children (Mullis, Campbell & Farstrup, 1993). Advocates for a literature-based approach to beginning reading instruction argued for expanded criteria (i.e., beyond vocabulary control and skills match) for judging the adequacy of texts to be used for beginning reading instruction (Galda, Cullinan, & Strickland, 1993; McGee, 1992; Wepner & Feeley, 1993). These expanded criteria included consideration of the quality of the literature, the predictability of the text structures, and the quality of the design. While some attempts, such as Rhode's (1979) criteria for predictable texts, were made to quantify these values into specific standards, more often the call for quality took the form of a call for more "authentic" literature. Operationally, "authentic" was interpreted by policymakers and program developers to mean that the literature used in basals must have first appeared as a published tradebook. Stories written "in-house" by basal authors and editors were discredited. The California and Texas adoptions in the early 1990s required basal publishers to attend to the quality of literature.
Our comparison of the literature-based basals (1993 editions) targeted for the Texas market to the skills-based basals (1987 editions) confirmed that the policy mandate for more quality literature in the texts for beginning reading had been successful (Hoffman, et al., 1994). Ratings on the engaging qualities of text, which focused on content, language, and design features, were found to be significantly higher for the literature-based basals. Further, our analysis revealed that predictability was being used far more often than in the past as a support for students reading challenging texts. However, lost in the enthusiasm for authentic literature was any systematic attention to the decoding demands of the texts. In fact, decoding demands increased dramatically with the new programs, and vocabulary control all but disappeared.
It became clear as we studied the implementation of these programs in Texas classrooms in the mid-1990s that many readers struggled with the challenge level of the materials (Hoffman, et al., 1998). This problem was particularly severe in schools serving large populations of "at-risk" students, especially at the start of the first grade year. In their 1996 adoption, the California legislature demanded that publishers attend to more explicit teaching of skills, but offered no specific requirements for the decodability of the text. Basal publishers responded. At the same time, there was an influx of "little books" specifically designed to support the development of decoding. Early on, these little books were imported directly from New Zealand, where they were used in association with the Reading Recovery program. Basal publishers in the United States began to produce similar materials to support decoding.
Menon and Hiebert (1999) analyzed the basal anthologies and little books (Martin & Hiebert, 1999) published during this period, using a computer-based text analysis program (Martin, 1999) to estimate the degree of decodability of the words presented (Figure 1). Following their procedures for estimating decodability, each word appearing in the text is classified on a scale from 1 (representing the easiest decoding demands--e.g., words with the consonant/vowel/consonant patterns) to 8 (representing the most complex levels of decoding demands--e.g., multisyllabic words and words with irregular phonic patterns). The average decodability rating for a text was the average decodability of the words presented. Despite the concerns expressed in this California adoption, Menon and Heibert found little evidence of any systematic attention to decodability.
The historical trends in basals which lead up to the Year 2000 adoption in Texas are only a single reference point for the current study. It is just as important to offer a theoretical reference point for understanding "leveled" texts in beginning reading and the role they play in the development of decoding skills. We use the term "leveled" to refer to texts that are graduated in difficulty or challenge level. The term "leveled text" is inclusive of both the traditional pupil texts found in basal reader programs and the many "little books" that are currently being marketed separately or in conjunction with basal reader programs (Roser, Hoffman, & Sailors, in press). The current study is grounded in a theoretical framework that draws attention to a set of key text factors promoting the acquisition of decoding skills (Hoffman, in press). This theoretical framework posits three major factors as important in the leveled texts used in beginning reading: instructional design, accessibility, and engaging qualities.
The instructional design factor addresses the question of how the words in leveled texts' various selections reflect an underlying instructional strategy for building decoding skills. Certainly, Beck's (1981, 1997) writings, as well as the recent mandates for decodable text in the State of Texas, reflect a concern tied to instructional design. This valuing of instructional consistency and alignment of skills taught and words read is not the only perspective one might adopt when considering instructional design. A sight word or memorization perspective, for example, might emphasize repetition and frequency over alignment of skills. Hiebert (1998) has argued for the importance of text in providing for practice with words and within-word patterns as a critical force in the development of decoding abilities. Frequent "instantiations" of patterns in a variety of contexts support the development of automaticity and independence in decoding. These instantiations may be in the form of repeated high frequency words, or of repeated common rimes (e.g., -og, -ip). Text with a strong instructional design for beginning readers provides for repeated exposure to these patterns, starting with the simplest, most common, and most regular words, and then builds toward the less common, less regular, and more complex words. Hiebert and her colleagues have developed a software program--the CIERA TexT Analysis Program (Martin, 1999)--that assesses these qualities. The program produces a text analysis that identifies, for example, the number of different rimes and instantiations for each rime, and the repetition rate of high-frequency words. In addition, the program analyzes the proportion of unique words to total words, referred to as the "density" of the text--that is, the average number of words a reader would encounter before meeting a unique (i.e., new) word. Text that supports the development of decoding must attend to all of these factors. The key to evaluating the instructional design of a series of leveled texts rests on an examination of the underlying principles for the development of the program, as they interface with the words which students are expected to read in texts.
As evidenced in this historical review, the leveling of texts to provide for "small steps" in growth has been a primary focal point of debate. Traditional readability formulas--a quantitative estimate of text difficulty--have proven a less than satisfactory tool for differentiating texts at the early grade levels (Klare, 1984). Readability formulae are simply too atheoretical and quantitative to capture many important dimensions of decoding and fluency development. Accessibility, in contrast, considers both the degree of decoding demands placed on the reader to recognize words in the text and the "extra" supports surrounding the words, which assist the reader with identification, fluency, and, ultimately, comprehension. For the analysis reported in this study, accessibility in text is tied to two factors: decodability and predictability. Decodability is focused on the word level, and reflects the use of high-frequency words, as well as words that are phonically regular. Predictability refers to the surrounding linguistic and design support for the identification of difficult words (e.g., rhyme, picture clues, repeated phrases). Decodability and predictability can work in concert to affect the accessibility of the text. Like engaging qualities, decodability and predictability are challenging constructs to measure. However, we have again found that holistic scales, rubrics, and anchor texts lead to reliable leveling. Further, we have found that these scales have validity in relation to student performance (Hoffman, Roser, Salas, Patterson & Pennington, 2001).
No theory of text, even one focused on the development of decoding abilities, can ignore issues of content and motivation. The construct of "engaging qualities" draws on a conception of reading that emphasizes its psychological and social aspects (Guthrie and Alvermann, 1998). Engaging text is interesting, relevant and exciting to the reader. Three factors in the engagingness of text are represented here: content, language, and design. Content refers to what the author has to say. Are the ideas important? Are they personally, socially, or culturally relevant? Is there development of an idea, character, or theme? Does the text stimulate thinking and feeling? Language refers to the author's way of presenting the content. Is the language rich in literary quality? Is the vocabulary appropriate but challenging? Is the writing clear? Is the text easy and fun to read aloud? Does it lend itself to oral interpretation? Design refers to the visual presentation of the text. Do the illustrations enrich and extend the text? Is the use of design creative and attractive? Is there creative use of print? Of course, all of these factors are discussed with reference to an assumed audience of beginning readers. Higher levels of engaging qualities are associated with greater effectiveness in supporting the development of decoding. The measurement of these qualities is a formidable, but not impossible, task: we have achieved high levels of reliability in their coding by using a combination of rubrics, anchor texts, and training, (Hoffman, et al., 1995). We have also validated these measures in relation to student preferences for text and found support for their salience (McCarthey, et al., 1994).
Whereas the presence of engaging qualities is viewed as a positive attribute for all leveled texts, the scaling for accessibility features and instructional design vary as an implied function of reader development. At the earliest levels, the optimal mix for accessibility may place fewer decoding demand on the reader while providing more support through predictive features. At higher levels, the decoding demands may increase while the amount of support offered through predictable features decreases. In the leveling of text, accessibility and instructional design must work together. Text that is highly accessible but does not push the reader to new discoveries is not useful in promoting automaticity (the instructional design factor). In contrast, text that pushes the reader into more complex patterns too quickly or haphazardly, without regard for accessibility, is of little help in promoting independence in decoding.
Two cautions are important before closing this discussion of leveled texts and decoding. First, our identification of the text factors that support of decoding is not meant to devalue the role of the teacher. The three factors--instructional design, accessibility, and engaging qualities--are reference points for leveled text only. The text is a tool which helps the teacher and reader reach the goal of early reading development. The success of this effort depends directly on careful and responsive teaching. Second, these text factors for leveled text may not be useful for characterizing the optimal structure of other kinds of texts important to the classroom literacy environment (e.g., trade books, reference materials, content area textbooks). Instructional design, accessibility and engaging qualities are factors that apply to leveled texts aimed specifically at the development of decoding skills and strategies. Leveled texts must work in concert with other texts and instructional experiences to promote independent reading.
While our research does not specifically focus on the policy formation activities surrounding the recent basal adoption in Texas, some background information is useful. Five publishers submitted complete K-3 basal programs in response to the Texas Textbook Proclamation of 1998 for the Year 2000 adoption. Stringent requirements were imposed on these programs for compliance with the state curriculum, and for the "decodability" of the words included in the pupil texts at the first-grade level. The decodable text construct called for in the Texas proclamation was significantly different from the construct represented in the holistic scales used in our previous research (Hoffman, et al., 1994) as well as in the research of Menon and Heibert (1999). The construct applied in Texas was more closely aligned with the work of Beck (1981) and Stein, Johnson and Gutlohn (1999). This conception of decodability rests not so much on specific word (phonic) features as it does on the relationship between what is taught in the curriculum (i.e., the skills and the strategies presented) and the characteristics of the words read. Rather than ranging on a continuum from high to low decoding demands/complexity, the Texas definition yields a yes/no decision on the decodability of each word. Following this model, the word "cat" is decodable only if the initial "c," the medial short "a," and the final "t" letter/sound associations have been taught explicitly within the program skill sequence. A word like "together" might be defined as decodable if all of the "rules" needed to decode it had been explicitly taught prior to students' encountering it in the text. A word that is not decodable at one point in time may become decodable after new skills are taught.
Decodability, as defined by the Texas Education Agency, refers to the percent of words introduced that can be read accurately (i.e., pronunciation approximated) through the application of phonics rules that were explicitly taught in the program design prior to the student encountering the word in connected text. We will refer to this as the "instructional consistency" perspective. Within this perspective, the decodability of a word is determined by the instruction that has preceded the appearance of the word in a selection.
Originally, the standard applied in the Texas review process was that an average of 51% of the words in each selection should be decodable in those selections which the publisher had designated as decodable. This standard was drawn literally from the Texas Essential Knowledge and Skills (TEKS) requirement that a "majority" of words be decodable. Later, the state board of education raised the standard to 80% of the words for each selection deemed decodable by the publisher. The Board did not cite any research evidence i support of the 80% level of decodability; however, some have suggested that Beck's (1997) estimate of 80% decodable as a minimum was the basis for this prescription. Eventually, all five of the publishers met the 80% standard and their products were approved for use in the state (S. Dickson, personal communication, December 3, 1999).
Many of the procedures followed in this study replicated those used in Hoffman et al.'s (1994) study, which compared the features of the 1987 basals (characterized as skill-based) with those of the 1993 basals (characterized as literature-based). For the current study, all of the texts from the first grade programs (2000 adoption) were entered into text files and analyzed for word-level features and vocabulary repetition patterns. Predictability, decodability, and engaging qualities were assessed by trained raters, who applied holistic scoring procedures and scales to the actual pupil text materials. This replicated the procedures followed in the study of the 1987 and 1993 adoption materials (Hoffman, et al., 1994).
In addition to our analysis of the 2000 basals, we also reanalyzed some of the data from the 1987 and 1993 basals to allow for comparisons across the three adoption periods. We limited our historical trends analysis using these comparative data to the three programs that have been part of all three of the most recent Texas adoption cycles (1987, 1993 & 2000).
The five basal programs are identified in this report through a letter identification system. This system keeps the focus on research variables, rather than program comparisons. The data are summarized in Table 1. Five factors should be kept in mind as these program descriptions are considered.
We use the general term "anthologies" to describe the selections included in the student readers, and the term "little books" to describe the selections that appeared in ancillary reading materials. The format for the little books varied from program to program. In some programs, little books were bound books; in other programs, little books were to be constructed by the teacher from black-line masters.
The data from this study reflects an analysis of over 100,000 words and over 600 selections from the 2000 basals, and is combined with a re-analysis of data from two previous adoptions. There are over 25 different variables derived from the holistic scales, the TEA analysis, and the CIERA Text analysis. The reporting of the data is guided by our two primary research questions. We will focus initially on describing the three major features of the texts for the Year 2000 basals as they relate to the designated "decodable" standards set by the state of Texas. We will then present the findings of an analysis comparing data from the 2000 basals to data from the previous two adoption cycles (1985 & 1993).
Our analysis of the data for the Year 2000 basals focused on the three major factors which we had identified as theoretically important: instructional design, accessibility (decodability and predictability), and engaging qualities.
This factor describes the importance of text that provides repeated practice with words and within-word patterns--features which are a critical to the development of decoding abilities. Table1 shows the range of the number of selections across the five levels for the five programs, from a low of 100 to a high of 160. The data reflect the breakdown of program selections that were designated as both decodable and non-decodable by their publishers. About half of the total number of selections across programs were labeled as decodable by the publishers (ranging from 30% in Program E to 96% in Program A). The total number of words found in the programs ranged widely, from 13,793 to 25,928. Total number of unique words ranged from 1,740 to 3,287. "Unique words" refers to the number of different words, and this was calculated within each program. Both the average number of words per selection, F (4,592) = 38.53, p < . 001 (Table 2), and the average number of unique words per selection, F (4,592) = 62.64, p <. 001 (Table 3) showed a statistically significant main effect related to program level. Both the average number of unique words and the average number of words per selection increase across levels. This finding suggests some attention on the part of the publishers to the instructional design factor, in the sense of providing for more practice with fewer words at the earlier levels. These averages are lower than those found in Menon and Hiebert's (1999) analysis of the basal anthologies submitted for the California adoption in the mid-1990's. They found averages of 170 words per selection and 75 unique words per selection. This difference could be explained by the influence of California's emphasis on more decodable text on the 2000 text adoption, or it could be explained by the fact that Menon and Hiebert's data only includes the words appearing in anthologies--not little books or decodable books. When we look at our data for only the anthologies in our data set, the averages are 165 words per selection and 72 unique words
Several other factors associated with the construct of instructional design showed a similar pattern across program levels. The percent of words following the CVC pattern showed a statistically significant pattern across program levels, F (4,612) = 50.35, p < .001, declining from a high of 68.7% at Level 1 to 47.8% at Level 5. The percent of unique rimes showed a statistically significant pattern across program levels, F(4,612) = 70.08, p < .001, rising from 16.6% at Level 1 to 52.5% at Level 5. Finally, the average total instantiation of rimes showed a statistically significant pattern across program levels, F(4,612) = 9.394, p < .001, declining from 78.9 at Level 1 to 72.6 at Level 5. All three of these analyses suggest that the text is leveled in a way that reflects attention to the instructional design features that support decoding. There are fewer rimes, more common patterns, and more instantiations of these patters at the earlier levels. Further analyses of these data reveal that the selections designated as decodable by the publishers reflect these patterns more than do the selections designated as non-decodable. The average percentage of CVC words for the designated decodable text was 64.5%, and for the designated non-decodable text was 50.3% F(1,615) = 124.37, p <.001. The average percentage of unique rimes for the designated decodable text was 41.3%, and for the designated non-decodable text was 35.6% F(1,615) = 7.121, p < .001. The average instantiation of rimes for the designated decodable text was 82.7, and for the designated non-decodable text was 70.0 F(1,615) = 182.26, p < .001. This pattern of differences between the designated decodable and designated non-decodable texts suggests that the decodable requirement may have increased the with-word regularity patterns in the text.
This factor refers to the difficulty of the decoding demands placed on the reader to recognize words in the text, balanced by any "extra" support (e.g., surrounding words) that may assist the reader in successful word identification. The next set of tables offer data related to the decodability ratings generated by the CIERA Text analysis program (Table 4) and the Hoffman et al. (1994) holistic scale for decodability (Table 5). The scores on the CIERA measure of decodability can range from an average of 1 (simple/common/regular words) to 8 (lesson common/less regular/more complex words). The patterns for each level are described in Figure 1. The CIERA analysis's concept of decodability is focused on the within-word level only. The data in Table 4 reflect the patterns as distributed by program, program level, and by decodability vs. non-decodability as designated by the publisher. Average decodability across all of the five programs was 4.0 (with a range from 3.7 to 4.4). There was a statistically significant main effect for program level, F(4,592) = 39.83, p < .001; across the five programs the average level of decodability increased from 3.5 at Level 1 to 4.5 at Level 5. The average across all programs for texts designated by publishers as decodable was 3.7, and for the text designated as non- decodable was 4.4. There was a statistically significant effect for designated decodable and non-decodable texts, F(4,5) = 43.87, p < .001. The difference in decodability was greatest at Level 1 (2.8 vs. 4.1) and smallest at Level 5 (4.3 vs. 4.6). These findings would suggest that the decodable text requirement had the desired impact in terms of the targeted text. By the CIERA measures, the texts are more decodable at the early levels, and the designated non-decodable text was indeed less decodable.
The data reported in Table 5 reflect our analysis of the Year 2000 basals, using the holistic decodability scale adapted from the Hoffman, et al. (1994) study (which ranged from a score of 1 for high frequency/phonically regular words to 5 for more difficult/phonically irregular words). The average decodability across all five programs was 2.4. Decodability at the program level ranged from 1.9 to 2.8 (with an average of 2.4). Decodability averages increased across program levels, from 1.8 at Level 1 to 2.7 at Level 5. There was a statistically significant main effect for level of decodability, F(4,607) = 30.17, p < .001. The average decodability across programs for those texts designated as decodable by their publishers was 1.8, and for the text designated as non-decodable was 2.8. There was a statistically significant difference between the designated decodable and non-decodable texts, F(1,615) = 176.22, p < .001. The largest discrepancies were at the earliest levels of the programs.
Interestingly, the CIERA and Hoffman scales both take into account all selections submitted by the publishers, and uncover a trend toward decreasing decodability requirements across levels, suggesting that beginning readers are being asked to make bigger leaps earlier in their movement toward reading independence. The TEA index for decodability reveals no such trend.
We computed a correlation matrix in order to compare these three measures of decodability. Our analysis included only data from those texts that were identified as decodable by the TEA analysis, since this was the only text from which a score was derived following the "have the skills been taught to decode the word" model. A substantial positive correlation was detected between the CIERA decodability measure and the holistic scale used from the 1994 study (r = .64). This high correlation is not surprising, given the two measures' similar construct for decodability as a within-word feature of difficulty. However, there was no correlation between the TEA measure and either the CIERA assessment (r = -.07) or the holistic scale (r = -.08). This lack of correlation suggests important differences in focus between a decodability measure tied directly to word features and a decodability measure tied to instructional consistency.
The ratings for average predictability are presented in Table 6. Scores of holistic predictability (Hoffman, et al., 1994) range from 1 (most supportive) to 5 (least supportive). We found ratings ranging from 3.4 to 3.9 across programs, with an average of 3.7. There were no clear trends in predictability across program levels. The average score at Level 1 was 3.6, and at Level 5 was 3.8. Similarly, there were no clear patterns in the predictability of designated decodable texts (3.6 average rating) with that of texts designated as non-decodable (average 3.8).
This second factor refers to qualities that make a text interesting, appealing, relevant, and exciting to the reader. Ratings on the Hoffman et al. holistic scale for engaging qualities range from 1 (least engaging) to 5 (most engaging). The average holistic score across all texts in our study was 2.4, as illustrated in Table 7.
The average holistic ratings of programs ranged from 2.2 to 2.8. There was a statistically significant main effect for program level, F(4,613) = 35.94, p < .001. The trend across levels was toward increasing engaging quality ratings, from an average of 2.0 at Level 1 to 2.9 at Level 5. The average rating across programs for texts designated as decodable was 2.2, and for texts designated as non-decodable was 2.6. This difference was statistically significant, F(1, 616) = 28.76, p < .001, with the designated decodable text rating as less engaging than the designated non-decodable text. The differences were greatest at the lower program levels.
The data for the three analytic sub-scales that support the holistic engaging qualities construct were analyzed separately. Scoring on each of the three subscales ranged from 1 (lowest) to 5 (highest). Ratings for content averaged 2.4 across programs. Content ratings increased from 1.7 at Level 1 to 3.0 at Level 5. This trend for increasing content ratings was statistically significant, F(4, 613) = 62.85, p < .001. The average overall rating for content in designated decodable texts was 2.2, versus an average rating of 2.6 for designated non-decodable texts; a statistical significant main effect was found, F(1, 616) = 27.85, p < .001.
Ratings for language averaged 2.3 across programs. Language ratings increased from 1.7 at Level 1 to 2.9 at Level 5, with a statistically significant main effect, F(4, 613) = 55.44, p < .001. The language rating of designated decodable texts was 2.0, versus 2.6 for designated non-decodable texts, significant at F(1, 616) = 96.03, p < .001. Design ratings averaged 2.7 across programs. No statistically significant patterns of change were identified for the design feature across program levels.
Across all of these analyses, we consistently found that the more decodable the text, the lower the ratings on engaging qualities, suggesting that the mandate to focus on decodability of text had negative implications for other aspects of texts for beginners.
Our second research question focused on historical trends across basal adoptions for beginning readers in Texas. For this analysis, we included only the data from the programs of the three publishers that were part of the 1987, 1993, and 2000 adoption cycles (publishers/programs A, D, and E). The data on the total number of words and the total number of unique words in these samples are presented in Table 8. The dramatic decrease in the total number of words between the 1987 and the 1993 programs was reversed in the year 2000 basals. In the 1993 editions, the total number of unique words increased dramatically from 959 to 1544. In the 2000 editions, the total number of unique words continued to increase, to 1792.
Because of differences in the numbering of the levels within programs between 1993 and 2000 we reanalyzed the data for each edition, considering only the first 1000 words in each program. We counted up to 1000 words in a program and then went on to the completion of that selection. While the number 1000 was a somewhat arbitrary choice, the emphasis that this created on the earliest parts of the programs was intentional.
Counting forward to the completion of the selections after the first 1000 words led to slight differences in the total number of words included (from 1001 in Program E of the 2000 edition to 1110 in Program D of the 1987 edition). These data, along with the data on the total number of unique words, are presented in Table 9. The numbers suggest a decline from the 1993 to the Year 2000 basals in the total number of unique words presented at the early stages of the program. This suggests increasing control over the use of unique words in the earliest text encountered by beginning readers, although this control is still not as rigorous as that found in the 1987 editions.
In the next two tables (Tables 10 and 11), we present data from the CIERA Text Analysis program for the first 1000 words. These analyses include an examination of decodability, but also include several specific variables that relate to instructional design. Data related to decodability and density are presented in Table 10. Here we see evidence for the effect of the more decodable text at the earlier levels, with the 2000 texts as the most decodable. Density, also included in Table 10, is a CIERA Text Analysis variable that reflects the relationship between running words and unique words. The statistic can be interpreted in terms of the average number of words a reader would encounter before meeting a unique (i.e., new) word. This is, in effect, another way of looking at the data in Table 10. The 2000 basals are less dense than the 1993 series, but are still more dense than the 1987 series, even in the first 1000 words. There seems to be more control over the frequency of beginning readers' encounters with new words. This suggests a move back to a more controlled vocabulary for beginning readers, albeit not as controlled a lexicon as that of the 1987 programs.
The CIERA Text Analysis program also considers the number of different rime patterns (as in phonograms) that are included in the text, and the percentage of the total text that is made up of these rimes. Rimes and instantiations are clearly up from the 1993 levels (Table 11), and are even higher in some cases than they were in the 1987 programs. These findings suggest increasing, although not, on the part of publishers or policymakers, purposeful, attention to instructional design as a factor in constructing texts.
In Table 12 we present our analysis of the first 1000 words which beginning readers encounter that relate to text accessibility, using the holistic scales for decodability and predictability applied in the Hoffman, et al., (1994) study. In decodability, we found a statistically significant main effect for Year, F(2, 133) = 23.11, p < .001. We see a shift in the 2000 basals toward more decodable text at this early level (average = 1.7), dropping from the 1993 levels (average = 2.5) to a level closer to the 1987 level (average 1.2). In terms of predictability, we found a statistically significant main effect for Year, F(2, 133) = 28.87, p < .001. We see a shift in the 2000 basals toward less predictable text at this early level (average = 3.5), dropping from the 1993 levels (average = 2.5), but still more supportive than the 1987 level (average 4.5), implying that there is currently far less "extra" word support which readers can use to successfully engage with the text.
The findings related to engaging qualities (from the 1993-2000 series) are presented in Table 13. The table reflects statistically significant declines in content, from 2.5 to 1.7 [F(2, 133) = 22.67, p < .001]. There were significant declines in language, from 2.7 to 1.5 [F(2, 133) = 27.14, p < .001]. Statistical declines in design were detected, from 3.4 to 2.9 [F(2, 133) = 9.63, p < .001]. In addition, the holistic ratings declined from 3.1 to 2.1 [F(2, 133) = 38.96, p < .001]. The gains in holistic engaging qualities that were made from 1987 to 1993 have been reversed in the current adoption. This suggests, perhaps, that the attention to decodability has deprived beginning readers of other crucial factors that support the psychological and social aspects of reading.
We have described the first-grade programs's vocabulary control and decodability features from various decodability perspectives. Specifically, we have detailed the ways in which these texts exhibit control over the earliest levels of the programs. We have described the apparent absence of attention to predictability and engaging qualities at the first-grade levels in the Year 2000 programs. And finally, we have revealed contrasting trends from 1987 to 2000. These analyses confirm, once again, that policy mandates introduced through state textbook adoption policies have a direct influence on the content of the reading programs that are put into the hands of teachers, and the reading materials that are put into the hands of their students. The publishers of the Year 2000 series met the standards set forth by the state by applying a decoding construct that only considers the relationship between explicitly-taught skills and the characteristics of the words being read. The patterns of decodability are not so clear when examined from a within-word perspective. Our findings suggest that the decoding demands of these programs are easier at the early levels and more difficult at the later levels, regardless of the conception of decodability. Both conceptions (i.e., instructional consistency and phonic regularity) are reflected in the texts designated as decodable by the publishers. But the fact remains that the direct measures applied in these two conceptions do not show a significant correlation. Two different conceptions, operating in parallel but not identical ways, appear to be in effect. This suggests the disturbing possibility that one conception is being manipulated directly by policy, while the other is allowed to vary freely. Without evidence of support from research with students using these materials, we are left to speculate as to why these differences exist, and whether they might merit instructional consideration.
The historical comparisons suggest that while the intended goal of making these texts more decodable has been achieved, albeit in uncertain ways, there are other important changes that may stem from a lack of attention. The basal texts of the 2000 adoption are far less predictable than those from the previous adoption. What this means is that there is far less "extra" word support to help the reader engage successfully with the text. At the same time, the quality of the literature appears to have suffered a severe setback from the 1994 adoption standards. Text engaging quality ratings have dropped, in particular at the earliest levels of the programs. Here again, we are left to speculate about the "costs" of giving up on predictability and engagingness. We do know that the loss of engaging qualities is likely to affect student motivation, and the loss of predictability has direct consequences for reading accuracy, rate, and fluency (Hoffman, Roser, Patterson, Pennington, & Salas, 2001).
The findings from this study are both encouraging and troubling. On the positive side, we find increased attention to instructional design and decodability in the Year 2000 programs. On the negative side, however, a lack of attention to other crucial variables, such as engaging qualities and predictability may have produced mixed effects. We can only assume that publishers and policymakers had no intention of decreasing the engagingness of the texts. We can only assume they were not trying to lower the predictability support features of the texts. And yet both of these outcomes are documented in our data. The danger is that an extreme focus on decodability may cause us to lose sight of other factors that should be considered in the development of text for beginning reading.
Even within the area of decodability, the results suggest that more careful work is needed. The "instructional consistency" conception of decodable text (i.e., words are deemed decodable based on the skills that have been taught) reflects a rational model of teaching and learning that makes superficial sense (Shulman, 1986); but, as research on teaching has demonstrated over the past two decades, teaching and learning are not always or even typically rational. Indeed, teaching and learning are complex domains that reflect numerous influences and factors. The assumption that teachers will systematically follow a scope and sequence from a basal is totally contradicted by the research (Hoffman et al., 1998). This is not to suggest that articulation between the scope and sequence for skills and the texts is inappropriate in program design, but it does suggest that a conception of decodable text that rests on the assumption of this connection may be flawed.
It may prove more effective to locate the "instructional consistency" perspective for text within an "instructional design" construct, which focuses on the progression of decoding practice and instruction across levels. This would position decodability as a within-word dimension, alongside predictability as a text accessibility factors. Our data continue to support the conception of decodability as a word-level factor, which operates in conjunction with predictability to produce "accessibility." In this view, instructional design as a text is more attentive to the progression of within-word level features across levels of text. Decodability as a text factor is placed along with predictability to describe accessibility at a given point in time. The two constructs are clearly related, but differ in their emphasis.
Finally, the data confirm that the roller-coaster ride of changes in texts for beginning reading continues to reflect the actions of policymakers. State adoption policies, particularly in the states of California and Texas, are forcing changes in textbooks, with minimal consideration for research or the marketplace. Neither the calls for "authentic literature" in the 1990s or the calls for "decodable text" in 2000 rested on a sound and complete theoretical conception of beginning reading and teaching. In some instances, we see politics result in the "illusion" of change. The basal programs associated with the Year 2000 adoption, for example, are commonly regarded as decodable. And yet, in fact, only a portion of the selections in these series is "decodable" by definition. In other instances, we see politics result in real changes, as with the decline in the engaging qualities of texts in the Year 2000 basals. One variable is manipulated and the others are ignored. If policymakers are determined to intervene directly, then they must draw on a theoretically rich and research-based conception of texts--one which is inclusive of multiple factors (e.g., instructional design, accessibility, and engaging qualities).
Better yet, if the policy community truly desires accountability for tax dollars spent, and with high educational standards upheld, they will free the marketplace to work. The consumers (teachers and those closest to the use of the texts) must be empowered to make decisions about when and what to purchase. Publishers will respond with products that meet the demands of the marketplace. Innovation and variation will be encouraged, not discouraged, as is the case in the textbook business today. Research will assume its proper role, revealing complexity and providing insight, rather than being held up as a template for success.
If marketplace forces had been allowed to work after the introduction of literature-based texts in the 1990s, then teachers would have demanded more careful leveling of text, without compromising the increase in engaging qualities that is so motivating for students. In this study, we have found examples of texts, across programs and levels, that combine access and support (i.e., attention to decoding demands and support through predictable features) with high engaging qualities. Why can't publishers compete on these terms? They would, if the marketplace was allowed to function with free competitive forces. As in any sector of a democratic and capitalistic economy, the policymakers are charged with insuring the freedom of the market and protecting against abuses. This is a critical role, and particularly now, with a shrinking number of competitors in the textbook publishing industry.
There is no disputing the fact that teachers, researchers, and policymakers share the goal of insuring full literacy for all students. But there are no simple answers to the challenges we face (Duffy & Hoffman, 1999). Complexity is inherent in education and literacy. Textbook selection will continue to be an important consideration in reading instruction, but will never be a solution to all the challenges of teaching. Texts can never be anything more than a resource for effective teachers. The goal must be to develop texts that meet the needs of teachers, not textbooks that create the illusion of a "teacher-proof" curriculum. We are concerned that the ill effects of states's efforts to manipulate instruction through state control over textbooks are now manifesting themselves at the national level, as the federal government attempts to prescribe certain programs and materials as "effective." This is a dangerous path to follow, given our experiences with state control over textbooks, and one that should be questioned and challenged.
That textbooks will continue to change is a given. We envision a time in the near future, however, when these changes are shaped by the students and teachers using these texts, as well as by the findings of research into text characteristics and how children learn to read.
Austin, M. C., & Morrison, C. (with Kenney, H. J., Morrison, M. B., Gutmann, A., & Nystrom, J. W.). (1961). The torch lighters: Tomorrow's teachers of reading. Cambridge, MA: Harvard University Press.
Beck, I. L. (1981). Reading problems and instructional practices. In G. E. MacKinnon & T.G. Waller (Eds.), Reading research: Advances in theory and practice (Vol. 2, pp. 53-95). New York: Academic Press.
Hoffman, J. V., McCarthey, S. J., Abbott, J., Christian, C., Corman, L., Dressman, M., et al. (1994). So what's new in the new basals? A focus in first grade. Journal of Reading Behavior 26(1), 47-73.
Hoffman, J. V., McCarthey, S., Elliott, B., Bayles, D. L., Price, D. P., Ferree, A., et al. (1998). The literature-based basals in first grade classrooms: Savior, Satan, or same-old, same-old? Reading Research Quarterly 33, 168-197.
Rhodes, L. K. (1979). Comprehension and predictability: An analysis of beginning reading materials. In R. G. Carey & J. C. Harste (Eds.), New perspectives on comprehension (pp. 100-131). Bloomington, IN: Indiana University School of Education.
Roser, N. L., Hoffman, J. V. & Sailors, M. (in press). Leveled texts in beginning reading instruction. In J.Hoffman & D. Schallert (Eds.), Read this room: Texts in beginning reading instruction. Mahwah, NJ: Erlbaum.
Shulman, L.S. (1986). Paradigms and research programs in the study of teaching: A Contemporary perspective. In M. Wittrock (Ed.), Handbook of Research in Teaching (3rd ed., pp. 3-36). New York: Macmillan.
We analyzed our data files using the CIERA Text Analysis Program (Version 1.3) (Martin, 1999). This software program analyzes the words, word patterns and rimes in texts. As the program's author notes, "The output of this program can be used to determine the difficulty and appropriateness of beginning reading texts." (p. 2). Since the output is quite detailed, we report here only those data that are most directly related to our research goals:
Each word in the text is rated on an 8 point scale from easy (1) to hard (8). All words of more than one syllable are given a score of 8. Words with two or more vowels are scored between a 4 and a 7 depending on the complexity of the pattern. One vowel words are scored between a 1-3 depending on the presence of digraphs or other factors. The score reported on the output is the average score for decodability across the entire text.
Summary of Texas Education Agency Basal Analysis Plan Related to Decodability. (Source: S. Dickson, personal communication, 3 December 1999). Only selections identified (i.e., designated by the publisher) as decodable selections were analyzed using the system that follows.