T his report examines the alignment between state standards and assessments in elementary reading. The impetus for our study was a request by the National Research Council's Committee on Title I Testing and Assessment for information on the extent to which students' performance on state assessments could be assumed to provide evidence of their level of achievement of state standards. We begin our report by providing the context surrounding the Title 1 legislation, which gave rise to concerns about alignment.
The 1994 Improving America's Schools Act (IASA) reauthorized Chapter 1 of the Elementary and Secondary Education Act (and returned Chapter 1 to its original name, Title I). This reauthorization brought with it some dramatically different strategies for meeting the educational needs of disadvantaged children. The new Title I calls for high standards for all children, and systemic reform strategies to enable all children to achieve these standards. Specifically, the Title I statute states that the standards:
shall include: (i) challenging content standards in academic subjects that (I) specify what students are expected to know and be able to do; (II) contain coherent and rigorous content; and (III) encourage the teaching of advanced skills; (ii) challenging student performance standards that (I) are aligned with the state's content standards; (II) describe two levels of performance, proficient and advanced, that determine how well children are mastering the material in the state content standards; and (III) describe a lower level of performance, practically proficient, to provide complete information about the progress of lower performing children toward achieving to the proficient and advanced levels of performance. For the subjects in which students will be served under this part,... the state plan shall describe a strategy for ensuring that [Title I] students are taught the same knowledge and skills and are held to the same expectations as are all children (as quoted in Rothman & Elmore, 1997).
The Title I statute makes the link between standards and assessments apparent by requiring states to develop assessments that are "aligned with the state's challenging content and performance standards and provide coherent information about student attainment of such standards" (IASA, U.S. Congress, 1994, p. 8). Assessments must be aligned to standards; otherwise, students preparing to do well on the tests will be performing tasks unrelated to the standards, and parents and the community will receive misleading information about children's performance (Rothman & Elmore, 1997). Compliance with this legislation has meant that states have new, critical roles to play. The high-quality, challenging standards and assessments at the core of this legislation are to be created by the states, not mandated by the federal government. States must develop their own content and performance standards and high-quality, carefully aligned assessments in order to determine how well children are meeting those standards (Payzant & Levin, 1993, p. 70).
State' progress in meeting the law's requirements was the subject of a report by Mary Jean LeTendre, the U.S. Department of Education's Director of Title I, to the NRC Committee on December 5, 1997 (Committee minutes). At the time of this report, Dr. LeTendre indicated that the Department of Education had approved the content and performance standards for 18 states; 19 states had received waivers and had until May 1998 to submit evidence that their standards were acceptable; another 4 states had waivers pending, and 9 states could not foresee meeting the May deadline. She also reported that most states had developed content standards, but many appeared to be having problems coming up with performance standards. The problem was due in part to the fact that states were not developing performance assessments on the basis of their content standards--rather, they were using existing tests that may not have been completely aligned with those standards.
At the same December, 1997, meeting, Ed Roeber, then of the Council of Chief State School Officers, reported that states had widely varied systems of assessment in place. He indicated that this situation came about in part because test publishers had been successful in convincing state officials that their tests measured state standards, when in fact the tests measured students' understanding of what was being taught. Studies that have been conducted show little correlation between tests and standards (e.g., Smith, 1997).
The importance of alignment to systemic reform in general, and Title I reform in particular, means that determining the extent of alignment within a system is a critical step in evaluating the success of reform efforts. According to Webb (1997), two or more of a system's components are aligned if they are in agreement or match each other. In educational discourse, the concept of alignment most commonly refers to the match between an assessment instrument (or instruments) and a curriculum. Both expectations and assessments are now of great concern to educators and policymakers, as the keys to standards-based education, systemic reform, and accountability.
Because of the centrality of expectations and assessments to current thinking on reform, the Committee was particularly interested in the status of state alignment of these two elements in the core Title I subject areas of mathematics and reading. Norman Webb had already undertaken an evaluation of alignment in mathematics and science that could provide information about mathematics, and the study reported here focused on elementary reading. Because the time available for this study was short and the Committee was interested in comparisons between mathematics and reading, We adapted the general concepts and procedures concerning alignment which were developed for mathematics (Webb, 1997).
Webb (1997) defines alignment as the degree to which expectations and assessments are in agreement and serve in conjunction with one another to guide students' learning toward what they are expected to know and do (p. 4). As such, alignment is considered a quality of the relationship between expectations and assessments, and not an attribute of one of these elements independent of the other. Alignment is not limited to a comparison between a single assessment instrument and a curriculum, but extends to a set of assessment instruments or the assessment system as a whole. Webb begins by quoting a report of the Mathematical Sciences Education Board (MSEB):
The term alignment is often used to characterize the congruence that must exist between an assessment and the curriculum. Alignment should be looked at over time and across instruments" (MSEB, 1993, p. 123). A single assessment may not be well aligned with curriculum because it is too narrowly focused, but it may be part of a more comprehensive collection of assessments that is in full alignment with the curriculum. (Webb, 1997, p. 3)
It is difficult to judge alignment for several reasons. Expectations and assessments are frequently expressed in multiple documents, making it difficult to assemble a complete picture. It is also difficult to establish a common language for describing different policy elements. Finally, the policy environment in an educational system can be constantly changing. For example, new goals are sometimes mandated while old forms of assessment are still in place.
The most common methods used by states to align components of their educational systems are described by Webb (1997) as: sequential development; expert review; and document analysis. The sequential development approach involves the development of standards, frameworks, and assessments in sequence, so that each component is aligned with the one from which it is derived. Alignment by sequential development is frequently controlled within an agency, and is less likely than the other types to include form of external review. In the absence of such a review, alignment can be strengthened by incorporating checking procedures that can be used by agency staff. One disadvantage of the sequential development approach is the amount of time needed to put a program in place. This approach also ignores the need for synergy among policy elements. Even when states declare that they have established alignment through sequential development, the process itself is actually much more dynamic and recursive than linear.
The expert review approach involves convening a panel of experts to review system components and judge the quality and extent of their alignment. Some states have built external review panels into the process for developing important elements of their system. The quality of expert reviews depends in part on the qualifications and expertise of the reviewers. Content area specialists are essential for any review panel which is judging the match between expectations and assessment. Providing opportunities for reviewers to interact and build consensus helps improve the quality of the review.
The document analysis approach involves coding and analyzing the match among documents that convey expectations and assessments. This is the approach undertaken by Webb and the authors of the present report, as well as by other alignment studies external to the states, such as ACHIEVE and TIMSS. The document analysis approach requires use of a common metric to compare the curriculum and assessments. The reliability of the partitioning and coding of documents can be checked using sampling techniques.
The fall, 1995, survey by the Chief State School Officers, reported in Webb (1997), suggests that the sequential development method may be the approach most frequently used by states. In this survey, state assessment directors were asked, "What does alignment mean in your state?" The most common response was that assessment activities and content standards were aligned by design. For example, "aligned means assessments will be based on the standards and indicators" or "the assessments are actually... designed to measure... outcomes and requirements stated in goals and objectives. Committees approve and reject items based upon their fit with goals and objectives." Or "Curriculum frameworks provide the assessment framework for developing tests. All test questions, etc. are developed to meet the curriculum objectives." In most of the states, frameworks and assessments were judged to be aligned if goals and learning objectives were considered in some way in the design or selection of the assessment instruments (or vice versa). Most states lacked a formal, systematic process for determining the alignment among standards, frameworks, and assessments.
Our document analysis of alignment in elementary reading was guided by the following two questions, among others: How can we characterize the alignment of state standards and assessments in elementary reading? How can we characterize the document analysis method used to evaluate alignment in elementary reading?
Our sample for this survey of the alignment of state standards and assessments in elementary reading comprised all 50 states and the District of Columbia. The data were collected and analyzed in late 1998 and early 1999.
A sample of four states was selected for more in-depth analysis, from the pool of states with approved standards and assessments that the states themselves had indicated were in alignment. This sample was selected pursuant to the Committee's advice that the study needed to provide more in-depth analyses of states with a variety of approaches to, and histories of, reform in general, and alignment in particular. The four selected states can be characterized as follows. State A has a set of mastery-oriented standards and uses a commercially published, norm-referenced test as its assessment. State B has a long history of promoting a heavily skills-oriented curriculum, and uses a state-developed, objective-referenced exam. State C is fairly new to reform and uses a combination of a norm-referenced and a state-developed exam. State D has a long history as a reforming state and uses a nationally developed criterion-referenced exam, in combination with an individualized oral reading assessment.
We began our investigation with the intention of doing a fairly in-depth analysis of a small number of states. Our first step, was to identify the pool of states which had indicated that their approved reading standards and assessments were in alignment. To identify this pool, we gathered information about the status of reading standards and assessments from all 50 states and the District of Columbia through brief telephone interviews. This preliminary information revealed some interesting trends, so we decided to expand our telephone interviews and include information from the 50 states and the District of Columbia as part of our report. A list of the interview questions used to gather information is provided in Appendix A.
From the information gathered in the initial interviews, we identified a pool of states from which we could select a sample for more in-depth analyses. All states which reported the alignment of approved standards and assessments were included in this pool. With the advice of the Committee, we further narrowed the pool and ultimately selected four states that represented different approaches to alignment. The National Academy of Sciences and the National Reading Council mailed out letters describing the study and asking the selected states to participate. Nondisclosure agreements were signed and we were provided with the assessments needed to conduct our analyses.
We adapted criteria which were originally developed for mathematics and science, and used them to evaluate the alignment of reading standards and assessments in our four sample states. These mathematics and science criteria were developed with the input of an expert panel formed by the National Institute for Science Education (NISE) and the Council of Chief State School Officers (CCSSO) (Webb, 1997). These criteria focused on five areas: Content; Articulation across Grades and Ages; Equity and Fairness; Pedagogical Implications; and System Applicability. Given the purposes of our investigation and the short amount of time available for completion of the study, only the Content criteria were used for the analyses reported here.
The mathematics and science criteria were intended to provide a means for thinking about alignment. As such, they refer to the correspondence or comparability between standards and assessments. The Content criteria (Webb, 1997) include six subcategories:
In working with these criteria, we found that Categorical Concurrence was not a good indicator of alignment in reading, due largely to the variety of ways in which states have chosen to deal with reading and the other language arts in their standards and assessments. We also dropped the Dispositional Consonance category when we added a new criterion, which we call Coverage. The Coverage criterion addresses the extent to which the objectives, both within each standard and overall, are represented by at least one assessment item. This analysis allowed us to determine whether there were dispositional standards/objectives that were not addressed by the assessment--thus eliminating the need for a separate Dispositional Consonance category.
As a result of these changes, we ultimately evaluated the alignment of standards and benchmarks in terms of five criteria: Range of Knowledge Correspondence; Balance of Representation; Coverage; Depth of Knowledge Consistency; and Structure of Knowledge Comparability. It was also necessary to determine exactly how to identify standards and assessments that were specific to reading. We addressed this issue by analyzing all standards and objectives that specified reading in their titles. In cases where standards and objectives were integrated across the language arts and reading was not identified separately, we analyzed the entire set of standards and objectives. In analyzing the assessments, we eliminated from consideration any assessment items that were not directly tied to a reading score of some type.
The standards and assessments for each of the four focus states were coded in the following manner. Each standard and its subordinate objectives were typed down one side of a spread sheet and rated for cognitive complexity. Assessment items were then numbered, and the numbers were placed across the top of this spread sheet. The coding of each assessment item according to the cognitive complexity rubric was then put into every cell which corresponded to an objective that the item appeared to measure. Each assessment item, therefore, might be related to more than one standard or objective. Assessment items that did not "hit" any objectives were simply included in the tally of the total number of assessment items. A sample coding sheet is presented in Appendix B.
We "trained" ourselves in this process by analyzing the alignment of standards and assessments from one of the states in our sample of four. As we did so, we developed and refined a list of decision rules that guided our evaluations of which objectives were "hit," or assessed, by each item. The resulting list of decision rules reads as follows:
After working together on the evaluation of one state and refining both the rubric for cognitive levels and the decision rules, we proceeded to the next state. Following Webb's recommendations for training procedures, we calibrated our analyses by discussing a few "anchor," or typical, items. After some slight adjustments following the calibration, we analyzed our independent ratings. We agreed on the cognitive levels of 80% of the items and 94% of the objectives. Ninety-four percent of our "hits," or evaluations of the objectives assessed by each item, were the same. We decided that this level of agreement was sufficient for us to independently analyze the final two states.
Our survey of statewide standards and assessments revealed that virtually all states have actively moved toward establishing standards and assessments (see Figure 1). Specifically, 50 states (98%) currently have or will have in place statewide standards by the end of 1999. The lone exception is a state that requires local districts to develop and implement their own standards. This state does not plan to give a statewide assessment. At least 40 states (78%) will have implemented statewide assessment systems to go along with their standards by the end of 1999. Three states reported that the development of their systems was in progress and they did not have a scheduled implementation date. It seems reasonable to expect that by the end of the 2000-2001 school year all but one (98%) state would have put their assessment programs into place.
The increase in the number of approved state standards from 1994 to 1996 appears to correspond to changes in Title I requirements. The period from 1996 to 1999 reflects the greatest increase in the number of statewide assessment systems being used for the first time, suggesting that in most cases assessments have followed close behind standards approval.
Our predictions about the variety of ways in which the domain of English language arts is conceptualized and parsed were most evident in the categories used for reporting student performance (see Table 1). There were few similarities among states in these categories, with the exception of those reporting general reading (32 states, or 64%), writing (27 states, or 54%) or language arts (17 states, or 34%) scores. Some states break general reading, writing, and language arts scores down into subcategories such as vocabulary ability or language mechanics. Others include categories distinct to other areas of the language arts, such as speaking, listening, and viewing.
With particular attention to the standards themselves, we found that the largest number of states--36 (72%)--cluster their standards around the primary/elementary, intermediate/middle school, and high school levels. Twelve other states (24%) have developed standards for each grade level. Two of these 12 states have also developed standards for each course taken at the high school level. In contrast, 2 states (4%) have written K-12 standards that allow school districts to meet local student needs as they deem best.
The assessments also show a predominant focus on the primary, intermediate, and high school levels (see Table 2). At present, only one state (2%) assesses students at every grade. The most frequently assessed grade in reading/language arts is the eighth grade, with 44 states (88%) testing students at that level. Next on the list is the fourth grade, with 39 states (78%) assessing students at this grade. It appears that greater emphasis is given to primary and intermediate levels, specifically the elementary school level, with 52% of the states assessing students in the third grade, and 56% assessing them in the fifth grade.
Our accounting of states' elementary reading assessments shows a great variety in the types of reading assessments they use (see Table 3). State-developed assessments are the most widely used: 31 (60%) of the states have developed their own assessments for measuring reading abilities. The second most frequent type of assessment is the commercial, "off-the-shelf" type, which is used by 26 (51%) of the states. Three states (6%) allow districts to choose among a selection of commercially available tests. Four states (8%) use a custom-designed test that was developed by a test publisher for use exclusively by that state. Only one state does not require an assessment; ten states require two assessments. Nine of those that require two assessments employ both a state-developed and an off-the-shelf commercial test.1 In contrast, the remaining state in this group employs an off-the-shelf and a customized commercial test. It should be kept in mind that this is a report of the current state of affairs in state mandated reading assessment. Two states that currently use commercial, off-the-shelf assessments report that they are developing their own assessments; another reports that it will soon begin using a customized commercial test.
We found a fair amount of consistency in the ways that states report their reading test results. The great majority of states (44, or 88%) report individual student scores. Four states (8%) do not report individual student scores; however, they do report classroom, building, or district scores. While 46 states (92%) indicated that reading levels/scores were specifically reported, two others (4%) noted that their assessment was specifically developed to assess overall language arts ability, and that they had purposely left the category score broad to reflect the complexity of the language arts.
States appear very committed to aligning reading and language arts standards and assessments. Only three states (6%) acknowledged a lack of alignment and foresaw a continuing lack of alignment between their standards and their assessment system. Due to various restraints, ranging from budgetary concerns to a lack of legislative or public support, these states did not anticipate progress toward alignment in the near future.
In general, the states reported using a variety of methods to determine alignment. Table 4 indicates the various approaches employed in relation to the types of assessments used. It is interesting to note that the dominant method for demonstrating alignment between standards and state-developed assessments was the state-led study. In contrast, the method used most often to determine alignment between standards and commercial assessments was the publisher-led study. Interestingly, over half of the states that had developed tests reported alignment as a result of sequential development--that is, the assessments were developed to address adopted standards. Only three of the 26 states that used commercial assessments reported that their assessments and standards were developed sequentially. In at least one of these cases, the state selected an assessment first, then adopted standards that were aligned with that assessment.
The two states that reported that the alignment of their state-developed assessments and standards had not been established also gave commercial tests. They reported, that these tests were aligned. Similarly, three of the six states that reported that their commercial assessments were not aligned gave state-developed tests that they reported as being aligned. At first glance, then, it appears that many states that gave more than one type of assessment relied more heavily on one type or the other to judge achievement of their standards.
The results of our state survey of standards and assessments indicates that by the end of 1999, 49 (98%) states will have their standards in place, and 40 states (78%) will have implemented their assessments. This increase in the number of states adopting standards and assessments appears to correspond with the changes in Title I requirements that were enacted in 1994.
Our survey indicated that state standards and assessments could be characterized by a great deal of variability in the way the domain was parsed with regard to reading and the other language arts. It also revealed that the majority of states organized both their standards and testing programs around grade-level clusters reflecting primary/elementary, intermediate/middle school, and high school levels. Although the states used a variety of types of reading/language arts assessments (e.g., state-developed and commercial), almost all states did some direct assessment of reading, apart from the other areas of the language arts. Over half of the states used some type of state-developed assessments, and many states used more than one type.
Finally, the survey revealed that the large majority of states considered their reading standards and assessments to be aligned. States used a variety of methods to determine alignment, with the state-led study being the most popular method for state-developed tests, and the publisher-led study being the most common method for the customized commercial assessments.
Officials in State A are very conscious of the importance of aligning their standards and assessments. In 1995, after the state legislature passed a law mandating that norm-referenced tests be given in kindergarten through eleventh grade, a series of committees reviewed the available commercial assessments and selected one, based in large part on their evaluations of content, administration, and norming. The form of the selected assessment is embargoed for the state. In 1997, the state approved standards at every grade level based on the categories of skills addressed in the assessment. In reading and writing, the objectives were modeled on those published with the norm-referenced test and standards developed by the National Assessment of Educational Progress. The testing company was supportive of the results of this process. Alignment of the norm-referenced test to the standards, then, is achieved through sequential development of the test and then the standards. A state-led study has verified this alignment.
For the purposes of this study, we analyzed the two third-grade reading standards (reading comprehension and reading vocabulary) and the norm-referenced test that was used to assess third-grade students' attainment of these standards. Although we did not consider it in this analysis, State A also mandates a state-developed writing test in the fourth, seventh, and eleventh grades. This test was developed after the standards, and alignment was established through sequential development.
State A's elementary language arts standards include listening, speaking, reading comprehension, reading vocabulary, writing, spelling, language, study skills, and technology. The standards that target reading, reading comprehension and reading vocabulary have the same labels as the sections of the norm-referenced test that target reading. Reading comprehension is explained by 20 objectives; reading vocabulary is explained by five. Most of these objectives are concise, specific, and easy to understand. Examples include "use context clues to determine meaning," "draw conclusions about a sequence of activities in an announcement or advertisement," and "recognize the correct meaning of a word with multiple meanings when presented in text." The objective we judged to be most cognitively complex is "identify theme, main idea, and author's purpose in a selection when it is not explicitly stated." We judged it to be in the third level of cognitive complexity, because demonstrating the attainment of this objective could include identifying abstract themes across a text.
The reading section of the third-grade assessment consists of 84 multiple-choice items, of which 30 are devoted to reading vocabulary and 54 to reading comprehension. The reading comprehension section includes nine passages: four fiction and five non-fiction. The authors of the four fictional passages are indicated. The nine reading selections range in length from 110 to 302 words, with the average length being 197. The reading vocabulary section includes 18 items which ask students to choose the correct definition of a word, six items which require students to use context clues in isolated sentences to identify the correct definition of a word with multiple meanings, and six items which ask then to choose the correct definition of a word with the aid of sentence context clues.
At present, State A determines students' level of proficiency in meeting the standards by considering their percentile scores on the norm-referenced test. Those who score at or above the fiftieth percentile are judged to achieve satisfactory performance. Those below the fiftieth percentile are judged to have made unsatisfactory progress toward the achievement of the state standards. School accountability for student performance begins at the third grade, and is based in large part on a basic skills score derived from the norm-referenced test. In addition to a readiness test in kindergarten, students in kindergarten, first, and second grade take the norm-referenced test, but these scores are used for diagnostic purposes only.
Range of knowledge correspondence and balance of representation. The second column in Table A.1 indicates that 80% of the total objectives in reading explain the reading comprehension standard, and that State A seems to place more emphasis on this standard than on reading vocabulary. The assessment, however, as indicated by the final column, places more emphasis on vocabulary than do the objectives. It is interesting to note that the final column totals 100%: a result of the fact that each item hit either one standard or the other, but not both. This reflects the method which State A used to ensure alignment--writing standards based on the test.
By our judgment, over 64% (or approximately two-thirds) of the objectives in reading are addressed by the assessment (see Table A.2). Of the seven unassessed reading comprehension objectives, three are difficult to assess in an on-demand, paper-and-pencil test: e.g., "read literary works by national and international authors to include, but not limited to: legends, folktales, and non-fiction," "chooses and responds to a variety of reading material for pleasure and information," and "experience content through imagery (visualization)." One of the reading vocabulary objectives--"given a variety of reading material, increase the number of recognized words presented in text"--faces the same constraint.
According to our analysis, the test emphasizes the assessment of two objectives in particular. A comprehension objective--"identify explicitly stated information including, but not limited to: story elements (e.g., setting, characters, plot), a set of directions, and functional reading (e.g. invitations, bulletins)"--is addressed by 26 items. A vocabulary objective--"recognize synonyms, antonyms, homonyms, and homophones for identified vocabulary words presented in isolation or within a group of words"--is assessed by 18 items. Three other objectives (one in reading comprehension and two in vocabulary) are assessed by six items each. Of the remaining eleven comprehension objectives that are represented in the test, three are assessed by only one item and eight are assessed by two or three items.
Table A.3 reveals that the cognitive level of the reading comprehension objectives is, on average, slightly higher than that of the assessment items with which they are associated. Approximately two-thirds of the reading comprehension objectives are rated at Level 2 or above, while only one-third of the reading comprehension items are rated at Level 2. None are rated above Level 2. The cognitive levels of the reading vocabulary objectives and assessment items are, on the other hand, highly aligned at the lower end of the cognitive complexity continuum. Overall, the average cognitive level of the objectives is 1.56, and the level of the assessment is 1.21--the lowest of any of the state assessments evaluated in this study.
State A's standards and objectives can be best characterized by the mastery model of curriculum. For the most part, the objectives read as a list of discrete skills that can be understood independently of the reader. These objectives include "identify explicitly stated information," "determine sequence," "recognize characteristics of a fictional and non-fictional story." Just two of the 25 reading objectives reflect a curriculum model other than mastery. These two--"make predictions based on prior knowledge and story information" and "chooses and responds to a variety of reading material for pleasure and information"--could arguably be located in a cognitive model because they address students' application of prior knowledge and personal response to create meaning from text. Even these two objectives, however, include elements--making predictions from story information, and choosing texts for information--that best reflect a mastery model. None of the objectives approach a social-constructivist perspective.
The norm-referenced test administered by State A is also best characterized by the mastery model. The exam emphasizes skills such as word identification, identification of word definitions in isolated sentences, and verbatim recall of information from passages. Although the exam includes some items that require students to infer, the information needed to make the inferences is explicit in the text. The exam does not require students to relate reading to their own experiences, or to participate actively in the construction of meaning. State A's standards and assessment are fully grounded in a mastery model of curriculum and are, therefore, well-aligned in this criteria.
State B approved its current assessment system in 1990, and first implemented it in the 1990-91 school year. A state-developed reading comprehension test is administered in grades 3 through 8, and a writing assessment is given in grades 4, 8, and 10. The standards developed by State B for English language arts performance were approved in 1997. For our analysis we examined the reading standards and objectives for grade 4, and the reading portion of the grade 4 state assessment. State B considers its assessment and standards to be aligned, and reports that the methods of establishing alignment were sequential development (with the assessment preceding the standards), a state-led study, and an expert review.
State B has developed standards and objectives for each individual grade level, K-12. In addition to the reading standards and objectives which we examined as part of our analysis, their framework document includes standards and objectives for Listening/Speaking (with subcategories in purpose, culture, audiences, communication) and Writing (with subcategories in purposes, penmanship/capitalization/punctuation, spelling, grammar/usage, writing processes, evaluation, inquiry/research). Each subcategory contains a standard followed by several objectives. This state's standards document includes over 100 objectives for English language arts in the fourth grade. As stated in the introduction to this document, the standards and objectives are designed to meet state legislation's goal that "the students in the public education system will demonstrate exemplary performance in the reading and writing of English language." It is also the state's goal that all children will be reading at grade level by the end of third grade.
The State B reading test is a reading comprehension test. It includes six passages, each followed by between four and eight multiple-choice questions. The passages include both fiction and nonfiction. The passages range in length from approximately 100 words to approximately 450 words. The passing rate for the exam was determined by the State Board of Education. In order to pass, students must answer 70% of the multiple-choice items correctly. This is a high-stakes assessment, with a school's overall passing rate (along with percentages of drop-outs and attendance levels) determining its designation as either exemplary, recognized, acceptable, or low-performing.
Table B.1 demonstrates the assessment's emphasis on reading comprehension, as opposed to the other eight standards. Thirty-eight of 40 (or 95%) of the assessment items reflect one or more of the reading comprehension objectives. However, the reading comprehension objectives make up only 22% of the total number of objectives. The standards for Text Structures/Literary Concepts and Inquiry/Research, which account for 19% and 15% of the total objectives respectively, are represented by only 8% and 10% of assessment items. In contrast, some standards account for a higher percentage of assessment items than total number of objectives. For example, Word Recognition accounts for just 5% of the total objectives, but 13% of the assessment items relate to those objectives. Vocabulary development accounts for just 9% of the total objectives, but 18% of the assessment items.
Overall, it seems that this test's focus is comprehension, and that therefore it does not accurately measure the other standards. Tables B.1 and Table B.2 together show that although 95% of the assessment relates to the reading comprehension objectives, only 67% of the objectives in this standard are covered by at least one assessment item. Eight of the 12 reading comprehension objectives are covered by 38 of 40 assessment items. This leaves 4 objectives in reading comprehension unaddressed in the assessment. Of 63 total times that an assessment item "hit" a standard, 39 of those were in reading comprehension. The three standards that are not represented at all in the assessment--culture, fluency and variety of texts--are difficult to assess with a paper-and-pencil test. In short, if we focus only on comprehension, then the test appears to align well to the objectives. If on the other hand the entire set of standards and objectives is considered, then there is far less alignment.
Again, the State B assessment covers the reading comprehension objectives to a much greater extent than it covers the objectives in the other standards. As shown in Table B.2, after comprehension objectives, the objectives reflected next most often in the assessment are those under the Vocabulary Development standard. Two of the five objectives in this standard, or 40%, are represented in the assessment. Six of the nine standards are represented by at least one assessment item, but only 30% of the total number of objectives is represented by the assessment. Although six of the nine standards are represented in some way in the assessment, this relatively high number is somewhat misleading since just one or two objectives are addressed in every standard except comprehension and only between two and seven assessment items represent any standard other than comprehension.
Tables B.3 and B.4 present information about the comparative levels of cognitive complexity between standards/objectives and assessment items. All of the objectives under the reading comprehension standard were rated as a 2 (83%) or a 3 (17%). This indicates that all of those objectives require more than literal recall of information, and also require a certain amount of intersentence or across-passage analysis, inference, summarizing, and theme identification. However, many of the items that represent those objectives in the assessment were given a rating of 1 for cognitive complexity (40%). Forty-eight percent of the items that mapped onto the comprehension standard were rated a 2, and just 10% of items were rated as a 3. Although the standard and objectives do not emphasize literal recall or identification, many of the comprehension assessment items only require that superficial level of analysis.
Similarly, Standard 8 includes no objectives rated at a level 1, while 25% of its objectives were rated a 2, 63% were rated a 3, and 13% were rated a 4. However, 100% of the items that relate to that standard were rated as a 1. The standard requires a level of cognitive complexity that is clearly not embedded in the assessment. In general, the assessment items tend to be less cognitively complex than the objectives. Indeed, the averages across objectives (2.1) and items (1.7) suggest that the cognitive complexity of standards and assessment items is not closely aligned (see Table B.4).
State B's standards appear to best reflect a cognitive model of curriculum. Their standards include reference to prior knowledge and the application of the readers' experiences in determining meaning of text. To cite one example, a fourth-grade objective reads: "support responses by referring to relevant aspects of text and own experiences." The objectives emphasize inquiry, requiring students to ask questions of text and investigate questions across sources. For example, another fourth-grade objective reads: "offer observations, make connections, react, speculate, interpret, raise questions, in response to text." These objectives are typical of several across-grade levels that emphasize students' role as active participants in the meaning-making process. Although several objectives do emphasize basic skills, and a few require students to examine their own experiences in relation to those of others, the document as a whole does not reflect either a mastery or a social-constructivist perspective.
The State B assessment reflects both a mastery and a cognitivist model. Many of the assessment items ask students to recall specific information from the text, thus testing the students' abilities to decode and locate information. Items that do not require students to do more than locate information in a passage reflect a mastery model. Other items, however, require that the student infer information from the story. Several questions require students to choose the main idea or the best summary of a passage. These questions require students to participate more actively in the meaning-making process. Although they do not explicitly require respondents to apply prior knowledge, many of them do so implicitly. A very few items seem to reach beyond a mastery model. However, much of the test reflects a mastery model, and to be fully aligned a test would need to more often reflect the cognitive model embedded in the standards and objectives.
Motivated largely by the Title I legislation, State C began developing a student assessment system in the early 1990s. By 1997, they had developed language arts standards and implemented a testing system. State testing consists of a state-developed reading comprehension test given in third grade, as well as a norm-referenced test given in grades 4, 8, and 10. The reading comprehension test is a "student-level" test, and although the results are published and distributed throughout the state, the state has not analyzed the test's alignment with the standards. The core of the testing program is the norm-referenced test. Its alignment to the standards was established through a workshop led by the test's publisher. The study found that 55% of the standards overlapped with the norm-referenced test. Ninety-eight percent of the norm-referenced test items are aligned with that 55% of the standards.
Although State C has not included its reading comprehension test in an alignment study, we believe that a consideration of its alignment is critical. If instruction is to be consciously aligned with challenging standards, then all components of the assessment and accountability system must be aligned with those standards. We argue that the results of any test whose results are disseminated throughout the state is, de facto, part of the assessment system. As a result, our analysis included State C's fourth-grade reading standards, the state-developed reading comprehension test given in the third grade, and the reading portion of the norm-referenced test given in the fourth grade.
State C has developed standards at grades 4, 8, and 10, the grades tested with the norm-referenced test. When developing its standards, State C drew from a variety of sources, including in-state studies and numerous externally-written standards. State C has four reading and literature standards: "students will use reading strategies to achieve their purposes in reading," "read, interpret, and critically analyze literature," "read and discuss literary and nonliterary texts in order to understand human experience," and "read to acquire information." Each standard is explained in more detail by objectives. Many of the objectives are very long and incorporate many skills and processes. For example, "use a variety of strategies and word recognition skills, including rereading, finding context clues, applying their knowledge of letter-sound relationships, and analyzing word structures" and "summarize ideas drawn from stories, identifying cause-and-effect relationships, interpreting events and ideas, and connecting different works to each other and to real-life experiences." This made it difficult to determine whether or not a particular assessment item was assessing a given objective.
It is also more difficult for us to evaluate alignment when the standards and objectives overlap. For example, six of Standard 1's eight objectives identify reading strategies; two do not. Objective 4 includes "comprehend reading by... establishing purpose...." Objective 8 is "identify a purpose for reading such as..." We also had difficulty evaluating the extent to which assessment items related to overly general standards/objectives. For example, Standard 4 is "read to acquire information," which could refer to almost anything in a reading comprehension assessment. In this case, we inferred on the basis of the accompanying objectives that this standard was intended to refer to interaction with non-fiction text.
The state-developed reading comprehension test consists of 104 multiple-choice items. Sixteen of these items assess students' prior knowledge of concepts and the vocabulary needed for easy comprehension of two passages. Because these items are included for purposes of school and district analysis of results and do not contribute to student scores, we disregarded them in our assessment of alignment. The test is comprised of three passages: two fiction (1126 and 1543 words) and one non-fiction (629 words). The average passage length is 1099 words.
State C reports students' performance on the test in four categories: advanced, proficient, basic, and minimal. The cutoff scores for these levels are determined by the state. Although the exact numbers vary slightly by year, students who answer approximately 79% or more of the items correctly achieve the proficient level. Students who fail to show proficiency may be considered for remedial reading services.
The reading language arts sections of the norm-referenced test which is given to students at the end of fourth grade consist of 64 multiple-choice items. Nineteen of these items, however, are not included in our present analysis because they assess either grammar or writing, as opposed to reading. A few of the 45 analyzed items are in fact writing or grammar items, but are included in the analysis anyway, because they are related to reading passages and could serve as evidence of student attainment of a reading standard. The analyzed portion of the test contains six passages: four fiction and two non-fiction. These passages range in length from 133 words to 423 words; the average passage length is 297 words. State C uses the cut scores provided by the publishing company to determine whether student attainment of standards is minimal, basic, or advanced. Pattern scores, as opposed to raw scores, are used to determine students' proficiency levels. As a result, we can say that students who achieve the proficient level answered approximately 67% of the items correctly. The actual number of correct answers, however, will vary according to students' patterns of answering simple and difficult questions.
Range of knowledge correspondence and balance of representation. Table C.1 indicates that over 40% of the total objectives explain Standard 1, "students will use reading strategies to achieve their purposes in reading." The rest of the objectives are fairly evenly distributed among the other three standards. The vast majority of items in both the state-developed reading comprehension test and the norm-referenced test assess Standard 1. More than half of the total items assess Standard 2.
Most of the items in both tests address only a few objectives--"comprehend reading by using strategies such as activating prior knowledge, establishing purpose, self-correcting..."(Standard 1, Objective 4), "recognize and recall elements and details of story structure... in order to reflect on meaning" (Standard 2, Objective 1), and "extend the literal meaning of a text by making inferences, and evaluate the significance and validity of texts (Standard 2, Objective 4). This finding reflects our decision rules. All but 15 of the items on the reading comprehension test were read by the students and therefore require comprehension, and so assessed Standard 1, Objective 4. Only seven items, all on the reading comprehension test, do not assess Standard 1. These seven do not require understanding of the preceding passage in the test, and their question prompts and possible responses are read to the students by the test administrator. Similarly, any item that required the recall of details in the story was judged to assess Standard 2, Objective 1, despite the fact that few items fully addressed that objective. Finally, any item that required an inference from the text was judged to assess Standard 2, Objective 4, despite the fact that no items required students to evaluate the significance and validity of texts. Our assessment of balance, then, is generous.
Table C.2 reveals that approximately two-thirds of the standards and objectives are covered by at least one assessment item. This table also indicates that the state-developed test does not add much coverage beyond that provided by the norm-referenced test. Given the inattention to alignment of the reading comprehension test, it is not surprising that only about 40% of the objectives are addressed by that test. The norm-referenced test, on the other hand, addresses the majority of objectives in 3 of the 4 standards.
Four of the seven objectives that are not covered by either assessment seem fairly difficult to assess in a paper-and-pencil test--"demonstrate phonemic awareness by using letter/sound relationships as aids to pronouncing and understanding unfamiliar words and text," (Standard 1, Objective 3) "read aloud with age-appropriate fluency, accuracy, and expression," (Standard 1, Objective 5) "identify a purpose for reading, such as gaining information, learning about a viewpoint, and appreciating literature," (Standard 1, Objective 8) and "identify a topic of interest then seek information by investigating available text resources" (Standard 4, Objective 2). The other three objectives that are not covered by either assessment seem relatively easy to address in an on-demand, paper-and-pencil test. "Identify and use organizational features of texts, such as headings, paragraphs, and format, to improve understanding" (Standard 1, Objective 7) can be at least partially assessed in a paper-and-pencil test. The other two perhaps could be more fully addressed. They are "distinguish fiction from nonfiction, realistic fiction from fantasy, biography from autobiography, and poetry from prose" (Standard 3, Objective 3) and "summarize key details of informational texts, connecting new information to prior knowledge" (Standard 4, Objective 1).