The CIERA School Change Project: Supporting Schools as They Implement Home-Grown Reading Reform
Barbara M. Taylor, University of Minnesota
P. David Pearson, University of California-Berkeley
Debra Peterson, University of Minnesota
Michael C. Rodriguez, University of Minnesota
W e know a great deal about what schools and teachers can do to promote reading success in the elementary grades. We also possess a great deal of knowledge about school change, and the importance of professional development. However, we are challenged by our apparent inability to put our knowledge to work. Even though we continue to learn more about effective schools, effective instruction, and effective change efforts, we seem hard-pressed to integrate and apply this knowledge in ways that positively impact the thousands of schools which are struggling to teach all children to read.
In the past, numerous studies of high-performing high-poverty schools have pointed to important building-level factors that must be in place in order for all children to achieve at high levels in reading. Emphasizing outcomes in reading achievement, Hoffman (1991) summarized the research on effective schools from the 1970s and early 1980s (e.g., Venezky & Winfield, 1979; Weber, 1971; Wilder, 1977). He discussed eight recurring attributes of effective schools:
In recent years, we have seen a revival of effective schools research, most likely due to widespread national concerns about student reading achievement. Taylor, Pressley, and Pearson (2002) summarized findings from five large-scale research studies on effective, high-poverty elementary schools, which were published between 1997 and 1999 (Charles A. Dana Center, 1999; Designs for Change, 1998; Lein, Johnson, & Ragland, 1997; Puma, Karweit, Price, Ricciuiti, Thompson, & Vaden-Kiernan, 1997; Taylor, Pearson, Clark, & Walpole, 2000). The six recurring themes that emerge from these five studies both support and extend the earlier research on effective schools.
Putting the students first to improve student learning. In four of these studies (Charles A. Dana Center, 1999; Designs for Change, 1998; Lein et al., 1997; Taylor et al., 2000), improved student learning was cited as the schools' overriding priority. Also, schools reported a collective sense of responsibility for school improvement. Teachers, parents, the principal, and other school staff members worked as a team to achieve their goal of substantially improved student learning and achievement.
Strong building leadership. Three of the studies (Designs for Change, 1998; Lein et al., 1997; Puma et al., 1997) documented the importance of strong building leadership. The principal may have worked to redirect people's time and energy, to develop a collective sense of responsibility for school improvement, to secure resources and training, to provide opportunities for collaboration, to create additional time for instruction, and to help the school staff persist in spite of difficulties.
Strong teacher collaboration. In addition to, or perhaps because of, strong leadership, strong staff collaboration was highlighted in four of the studies (Charles A. Dana Center, 1999; Designs for Change, 1998; Lein et al., 1997; Taylor et al., 2000). Teachers planned and taught together, with a focus on how to best meet students' needs. They reported a strong sense of building communication, talking and working across, as well as within, grades, which contributed to better understanding of one another's curricula and expectations.
Focus on professional development and innovation. Four of the studies (Charles A. Dana Center, 1999; Designs for Change, 1998; Lein et al., 1997; Taylor et al., 2000), stressed ongoing professional development and the implementation of new research-based practices. Many of the successful schools in these studies, emphasized a type of sustained professional development in which teachers learned together within a building and collaborated to improve their instruction.
Consistent use of student performance data to improve learning. Four of the studies (Charles A. Dana Center, 1999; Designs for Change, 1998; Lein et al., 1997; Taylor et al., 2000) found that teachers in effective schools systematically shared student assessment data, usually on curriculum-embedded measures, as a part of the process of making instructional decisions to improve pupil performance. Teachers also worked together to carefully align instruction to standards and state or district assessments.
Strong links to parents. All five studies (Charles A. Dana Center, 1999; Designs for Change, 1998; Lein et al., 1997; Puma et al., 1997; Taylor et al., 2000) reported strong efforts within schools to reach out to parents. Schools worked to win the confidence of parents and then built effective partnerships with them in order to support student achievement. Parents were treated as valued members of the school community. Schools also reported a positive school climate, good relations with the community, and high levels of parental support.
Research on effective school reform and professional development. Research on effective school reform and teacher professional development is consistent with the research on effective schools in general, in that it stresses the importance of teachers learning and changing together over an extended period of time, as they reflect on their practice and implement new teaching strategies (Fullan, 2000; Fullan & Hargreaves, 1996; Louis & Kruse, 1995; Richardson & Placier, in press.) In successful schools, which typically operate as strong professional learning communities, teachers systematically study student assessment data, relate the data to their instruction, and work with others to refine their teaching practices (Fullan, 2000). Reflective dialogue, deprivatization of practice, and collaborative efforts all enhance shared understandings and strengthen relationships within a school (Louis & Kruse, 1995).
The knowledge base for effective teaching, especially teaching reading in the elementary grades, is equally as strong. In a recent NEA research report, Taylor, Pressley, and Pearson (2002) summarize this research, noting several distinct historical waves of work. From the process-product research of the 1960s and 1970s (Brophy,1973; Dunkin & Biddle, 1974; Flanders, 1970; Soar & Soar,1979; and Stallings & Kaskowitz, 1974) we learned that more effective teachers maintained an academic focus, kept a high incidence of pupils on task, and provided direct instruction. Effective direct instruction included making learning goals clear, asking students questions as part of monitoring their understanding of what was being covered, and providing feedback to students about their academic progress. Effective classrooms were found to be warm, democratic, and cooperative, with more teacher instruction devoted to weaker students, who were also given more time to complete tasks.
A second wave of research on teaching reading, which began with the work of Duffy and Roehler in the 1980s, taught us about the cognitive processes used by outstanding teachers. More effective teachers engaged in modeling and explanation to teach students strategies for decoding words and understanding texts. Knapp and associates (Knapp, 1995) found that effective teachers stressed higher-level thinking skills more than lower-level skills. Continuing in this tradition, Taylor et al. (2000) found that accomplished primary grade teachers provided more small-group than whole-group instruction, had high pupil engagement, had a preferred teaching style of coaching as opposed to telling, and engaged students in more higher-level thinking related to reading than other teachers.
In the most recent wave of research, Pressley and his colleagues (Pressley, Wharton-McDonald, Allington, Block, Morrow, Tracey, Baker, Brooks, Cronin, Nelson, & Woo, 2001) have focused our attention on the characteristics of teachers nominated as exemplary in practice by their peers and supervisors. These researchers found that effective primary grade teachers did provide a balanced literacy program: they taught skills and got their students actively engaged in a great deal of actual reading and writing. They also encouraged students to self-regulate their use of strategies. Interestingly, the National Reading Panel Report (2000) implicated balanced literacy instruction in its conclusion that instructional attention to systematic phonics, phonemic awareness, fluency, and comprehension strategies was important to a complete reading program. (pp. 2-89).
In short, we have learned different, but complementary, lessons about the teaching practices of excellent elementary literacy teachers from the last four decades of research on effective teaching. The overall picture is consistent with the earlier process-product research to some extent, especially with regard to engagement, but goes beyond it in ways consistent with Duffy, Roehler, et al.'s (1987) direct explanation approach and Knapp and associates' (1995) emphasis on higher-order literacy instruction (i.e., instruction which emphasizes comprehension and communication). Excellent elementary literacy teachers balance skills instruction with more holistic teaching (Pressley, 1998). In the best classrooms, students are engaged much of the time in reading and writing, with the teacher monitoring student progress, encouraging continuous improvement and growth, and providing scaffolded instruction to help students improve their use of various strategies.
Amidst pressure for schools to adopt off-the-shelf reform programs as a way of improving student achievement (Herman, 1999), it is interesting to note that, by and large, the schools in the studies summarized by Taylor, Pressley, and Pearson (2002) did not necessarily view packaged reforms as the key ingredient for improving student achievement (Charles A. Dana Center, 1999; Designs for Change, 1998; Taylor et al., 2000).1 The common denominators seem to be commitment and hard work focused on research-based practices at both the classroom level and the school level.
The overall objective of this project is to test the efficacy of a school reform framework which was designed to be used by elementary schools, in order to develop local reading programs that would improve students' reading achievement. The study is guided by two fundamental questions:
In our attempt to answer these questions, we did not, nor do we think that we can or should, use a classic experimental paradigm. We did not, for example, randomly assign programs or even particular programmatic components to schools and teachers; to do so would have violated what we have learned from the last 20 years of research on school change--that school staffs must be involved in creating the programs for which they will be held accountable. However, it is neither necessary nor desirable to invite each and every school to rediscover the wheel. Therefore, what we did was to offer school staffs a framework for making their own decisions about how they might redesign their reading program. The framework consists of a set of six components derived from research-based knowledge about how to build an effective reading program. These components include classroom reading instruction, school reading programs, reading interventions, school-home-community relations, school change processes, and professional development. Each component is made available to a school via an Internet-based multimedia program offering research summaries, readings, video clips of effective practice, and learning activities to guide local action. Support for implementation of the framework is provided to schools through assessment tools and the data obtained with those tools, an external facilitator, an internal leadership team, schoolwide efforts, and study groups that focus on implementing effective practices.
Five schools served as project schools and used the CIERA School Change Framework in 1999-2000. Three of these schools continued with the project in 2000-2001, and six additional schools joined the project at this time. All were high-poverty schools, with 70-95% of the students qualifying for subsidized lunch. Across schools, 2-68% of the students were non-native speakers of English, and 67-91% of the students were members of minority groups. The 11 project schools were from eight different school districts spread across a rural area in the southeastern U.S.; an eastern city; two small towns in the Midwest; a large city in the Midwest; and a large city in the southwestern U.S. In order to become a project school, at least 75% of the teachers in a building had to agree to participate in the project. In all schools, two teachers per grade were randomly selected and invited to participate in the classroom observations, interviews, and completion of instructional logs. If a teacher declined, children from her classroom remained in the school-level analyses. Because the grade levels within buildings differed, children in 7 schools came from grades K-5, in 3 schools from K-6, and in 1 school from grades K-3. Within the designated classrooms, teachers were asked to divide their classes into thirds (high, average, and low) in terms of perceived reading ability; children were then randomly selected from each third. In the fall, 9 children were randomly selected as target students: 3 each from the high, middle, and low thirds of the classroom continuum. In the spring of 1999-2000, due to resource limitations, 2 high, 2 average, and 2 low children were randomly selected from each classroom for post-testing. In the spring of 2000-2001 as many as possible of the 9 children per class who remained at the school were tested: the average was 8 children per class.
The children who were randomly selected for participation were assessed in the fall and spring on a number of literacy measures, which varied depending on grade level. Assessments included a standardized reading comprehension test (grades 1-6) as well as tests considering letter-name knowledge (K-1), rhyme (K-1), phonemic awareness (K-1), word dictation (K-1), concepts of print (K-1), fluency (words correct per minute; Deno, 1985) (1-6), and writing (responding to a common prompt) (1-6). See Table 1 for details.
In the fall, kindergarten children were individually assessed (Pikulski, 1996) on letter-name knowledge (students were asked to give the names of the upper- and lowercase letters); concepts of print (students responded to 8 items dealing with concepts related to words, letters, sentences, tracking, etc.); and rhyming ability (students responded to eight items, each giving a word that rhymed with a prompt). In the spring, kindergarten students were individually assessed on letter-name knowledge, concepts of print, and rhyme. They also completed an individually-administered, 12-item phonemic segmentation and blending test, in which they segmented words into phonemes and blended phonemes into words (Taylor, 1991), and a group-administered word dictation test in which they wrote 15 pre-primer and 15 primer words (Colt, 1997).
In the fall, children in grade 1 were individually assessed on letter-name knowledge, and phonemic segmentation and blending, and children were assessed in small groups on word dictation. In the spring all students were individually assessed on reading fluency (in which students read aloud for 1 minute to obtain a score for the number of words read correctly in 1 minute; Deno, 1985) based on a grade-level passage from the Basic Reading Inventory (BRI) (Johns, 1997). In a group setting students took the reading comprehension subtest of the Gates-MacGinitie Reading Test (MacGinitie, MacGinitie, Maria, & Dreyer, 2000) and responded to a writing prompt in which papers were scored according to a 4-point scoring rubric (Michigan Literacy Progress Profile, 1998).
In the fall, children in grades 2-6 were individually assessed on fluency (words correct per minute) based on their reading of a BRI passage (Johns, 1997) that was one grade level below their grade placement. As a group they took the comprehension subtest of the Gates-MacGinitie Reading Test (MacGinitie, MacGinitie, Maria, & Dreyer, 2000) and a writing prompt. In the spring, all children were assessed on fluency using a passage at grade level (Johns, 1997), on reading comprehension (Gates), and on writing (using the same prompt as was used in the fall).
Each response to the writing prompt was scored by one person from a team of trained scorers according to a rubric. Twenty-five percent of the writing samples at each grade level were scored by a second scorer, with 83% agreement between the 2 scorers.
Teachers voted by secret ballot on whether to participate in the school change project. Schools were eligible to join the project if 75% of the teachers in the building voted to participate. Staff agreed to meet for a minimum of 1 hour a month as a large group to work on the school change effort, and 1 hour a week, on average, in smaller and more focused study groups. A school leadership team made up of teachers, the principal, and an external facilitator (who was to spend a minimum of 8 hours a week in the building) was responsible for guiding the staff through the school change activities. Large-group activities were to include discussion and action on the schoolwide reading program, early reading interventions, and parent partnerships, as well as on issues related to school change and professional development. Reports were also expected from the study groups. Small-group activities were to include within-grade and across-grade study groups which focused on particular aspects of classroom reading instruction and student work (e.g. comprehension instruction, phonemic awareness instruction). Groups were encouraged to review the research on the CIERA School Change website; to download, read, and discuss articles on research-based practices related to their focus area; to view and discuss video clips of effective practice on the site; and to share video clips of their own practice. Members of study groups also raised issues, solved problems, and developed action plans related to their focus area to make changes in their classroom reading instruction.
In addition to these components, schools agreed to several other practices and commitments in this multi-year project: cross-grade collaboration; the development of a plan to involve parents as partners in the delivery of their school reading program; and the commitment to continue with the project for at least 2 years, addressing, across that time span, all six major categories of the school change framework.
Use of data emanating from the project was also an important component. At the beginning of Year 1, facilitators received a summary of the Beating the Odds research (Taylor et al., 2000) on characteristics of effective schools and teachers, which they were asked to share with the teachers in their schools. Teachers also completed a checklist asking about various topics they felt they should cover during the year on characteristics of effective schools and teachers; these topics were covered on the Internet site. The purpose of both of these activities was to help schools set priorities for study groups and large-group meetings.
At the beginning of the second year, returning schools received a summary of the Beating the Odds research and a personalized school report that focused on their performance on school and classroom variables, as compared to the mean of other schools in the study. Schools new to the project received a generic version of the school report that included the cross-school means for school and classroom variables. The report included correlations identifying the school and classroom factors which are related to growth in students' reading and writing ability. Finally, the teachers also completed a questionnaire about their perceptions regarding the presence of various school and classroom characteristics, and their opinions about where their school should focus its reform efforts during the upcoming year. This questionnaire--like its predecessor, the previous year's checklist--was tied to topics covered on the Internet site, and was designed to help schools set priorities for their study groups and overall reform efforts.
Teachers were interviewed in the fall, winter, and spring; principals were interviewed in the fall and spring. The interview data were used primarily to document program features and participant beliefs. Each interview lasted about 30 minutes.
Teachers meeting in study groups were asked to complete a common study group meeting form after each session and develop an action plan. The external facilitators were asked to keep brief monthly notes summarizing the activities pertaining to the school change project that had transpired at their school. They were also asked to write an end-of-year report. The data from the notes, action plans, and end-of-year report were used to document the change process at the school level.
On three occasions (fall, winter, spring), each teacher who agreed to be in the data collection sample was observed for an hour during reading instruction time, to document their classroom practices in the teaching of reading. All observations were scheduled. The observers were trained to use the CIERA Classroom Observation Scheme (Taylor et al., 2000), and were expected to demonstrate at least 80% agreement with a "standard" coding at each of the seven levels of the coding scheme (Taylor et al., 2000), prior to conducting classroom observations.
The observation system (influenced by the work of Scanlon & Gelzheiser, 1992; Greenwood et al., 1995; and Ysseldyke & Christenson, 1993-96) combined qualitative note-taking with a more quantitatively-oriented coding process. The observer took field notes for a 5-minute period, recording a narrative account of what was happening in the classroom, including, where possible and appropriate, what the teacher and children were saying. At the end of the note-taking period the observer recorded the proportion of children in the classroom who appeared to be on task (i.e., doing what they were supposed to be doing). They then coded the three or four most salient literacy events (Category 4 codes) that occurred during that 5-minute episode. For every Category 4 event, the observer also coded who was providing the instruction (Category 1), the grouping pattern in use for that event (Category 2), the major literacy activity (Category 3), the materials being used (Category 5), the teacher interaction styles observed (Category 6), and the expected responses of the students (Category 7). An example of a 5-minute observational segment is provided in Table 2. (See Table 3 for a list of the codes for all the categories.) In Table 2, the codes "c/s/r" refer to categories 1--3, and codes "r/t/a/r", "wr/t/c/or-I", and "v/t/r/or" each refer to categories 4-7.
|
Phonics |
|
|
Number of students on task/ |
At the end of the observation, the observer wrote a summary addressing seven key features of the classroom ecology: (a) the general instructional approach used in the classroom, instructional sequences observed, approaches to word recognition, vocabulary, and comprehension instruction; (b) curriculum materials used; (c) teacher's style of interacting with the children; (d) teacher's grouping practices, and activities of children not with the teacher; (e) student engagement; (f) classroom management; and (g) classroom climate.
The observations were used as a source of data for individual teachers. In February of Year 1 teachers were invited by letter to receive, upon request, copies of their first two classroom observations and an explanation of the codes used in the observations. At one school, 75% of the teachers asked for copies of their observations; at three more schools, requests for feedback ranged from 42 to 50% of teachers; and at the fifth school, only one teacher out of 14 asked for feedback. In Year 2, based on requests from teachers for regular feedback related to their observations, teachers received a copy of each observation, a description of the codes, a brief summary of research related to the major coding categories being analyzed for the project (e.g., incidence of whole group instruction, and incidence of higher-level questioning; see page 17), and a table summarizing the codes from their observations (e.g., the incidence of whole-group instruction, the incidence of higher-level questioning, etc.) and comparing them to the means at their grade level across all schools. External facilitators received training in how to interpret observations so that they, in turn, could help teachers understand the information contained in these observations without making the interpretations for them. Teachers were encouraged to go to the facilitators with questions.
In addition to the observations, in 2000-2001 teachers also completed six logs, one each for a high-, average-, and low-ability student, for an entire week in the winter, and once again for the same three students in the spring. Teachers used the log to record time spent on various literacy activities, types of texts and materials used, and grouping practices.
Data from the interviews were used to document the characteristics of various school-level factors at each school site, as well as the extent of the reform effort at that school. The meeting notes and action plans completed by the study groups, along with the monthly notes and end-of-year report completed by the facilitators, were used to further describe the extent of implementation of the change process at the project schools.
We applied a coding rubric to the interviews in order to evaluate teachers' perceptions of the degree to which the following factors, previously found to be important in effective schools (see pp. 1-3), existed in their schools: (a) building collaboration in the delivery of reading instruction; (b) links to parents; (c) reflection and change pertaining to instruction; (d) collaborative professional development; and (e) strong building leadership (and the extent to which this leadership was invested in the teachers, as well as the principal).2
|
Mean rating for all schools |
|
Each teacher's set of three interviews was used to rate school-level factors (collaborative leadership, building collaboration in the teaching of reading, reflection on teaching, collaborative professional development, and links to parents) on a 4-point scale, which was designed to capture the strength or degree to which each factor was perceived to be present in that school: (0 = very low perception, 1 = low, 2 = moderate, and 3 = high). This coding rubric is presented in Table 4. All of the interviews were coded by one member of the research team. A second team member independently coded the interviews from a random sample of 25% of the teachers; the mean agreement on overall rubric scores was 87% across the five variables.
The five ratings were summed to generate a school effectiveness score for each school in the study. The 11 schools from Year 2 and the three schools from Year 1 had a mean school effectiveness score of 8.3, (SD = 1.7), for a range from 5.4 to 10.2 (out of a total possible score of 15).
Although schools had agreed, in principle, to the conditions for the study, they exhibited considerable variability in their degree of adherence to the reform framework. Factors considered important to the reform included the following: (a) meeting for 1 hour a week in study groups; (b) meeting in cross-grade groups; (c) reflecting on teaching in study groups; (d) considering research-based "best practices" in study groups; (e) completing action plans in study groups; (f) selecting substantive topics for study; (g) maintaining topics over time; (h) meeting as a whole faculty once a month to discuss reform efforts; (i) working on parent partnerships and making effective use of the external facilitator; and (j) making effective use of the internal leadership team. Using the comments of each teacher across the three interviews, the study group meeting notes, study group action plans, facilitator logs, and the end-of-year reports, we built a scale indicating the degree to which a school was perceived to be implementing the various components of the school change framework (see Table 5). We then calculated a mean reform effort score for each school. The mean score was 4.2 (SD = 1.6) out of a possible 10 points. One member of the research team rated each school on each of the 10 dimensions of implementing the reform. A second member of the research team also read through the artifacts and rated each school. The Pearson correlation coefficient across the two scorers' ratings was .92.
Note: One point was awarded for each of the reform components if the criteria in parentheses for a particular component were judged to be met.
Coding the observations. As the first author of this paper visited research sites, she joined each observer in an observation, in order to establish inter-rater reliability data on the observation coding scheme. Across these 12 observations, agreement with the senior author ranged from 77% to 98%: 98% at Level 2 (grouping), 87% at Level 3 (major literacy focus), 80% at Level 4 (specific literacy activity), 90% at Level 5 (material), 80% at Level 6 (teacher response), and 77% at Level 7 (student response).
An expert observer who had done many classroom observations using this scheme and helped to refine it read through all of the observations to further assess the degree to which observers were using the codes in a similar manner. For example, although decision rules had been established in order to help an observer distinguish between similar codes, one observer may have coded a teacher's reference to the main idea of a story as a comprehension skill, while another observer might have coded a very similar exchange as a higher-level question about the story. The expert observer did not code the observations "blind." Instead, she recorded a different code only if she could not agree with the observer's code after reading the narrative description of a particular 5-minute segment. For a random sample of 10% of the observations, the agreements between the observers and expert observer at each of the levels of coding were measured as follows: 97% at Level 2 (grouping), 96% at Level 3 (major literacy focus), 80% at Level 4 (literacy activity), 86% at Level 5 (material), 77% at Level 6 (teacher response), and 83% at Level 7 (student response). Since there was variability between the observers and the expert, especially at Levels 4, 6, and 7, a decision was made to use the expert's codes for those instances in which the observer and expert disagreed, in order to ensure maximum consistency across the many observers.
A second expert reviewer, a member of the research team, read through the same random sample of 10% of the observations. The agreement between the first and second expert at each of the levels of coding was very high: 98% at Level 2 (grouping), 97% at Level 3 (major literacy focus), 93% at Level 4 (literacy activity), 97% at Level 5 (material), 91% at Level 6 (teacher response), and 93% at Level 7 (student response).
Certain aspects of the data from classroom observations (i.e., those found to be important in previous research--c.f., pp. 2-3) were analyzed in order to investigate the relationship between various classroom instructional practices and students' reading and writing ability. The classroom practices which were analyzed included the following (see Table 6 for descriptions of the categories):
For the 10% sample of observations described above, the second expert reviewer agreed with the first about the codes which made up the variables used in the data analyses: 100% whole-group, 99% small-group, 95% vocabulary instruction, 91% phonemic awareness instruction, 91% phonics instruction, 94% coaching in word-level strategies, 96% asking lower-level questions, 82% asking higher-level questions, 100% comprehension skill instruction, 88% comprehension strategies instruction, 94% teacher-directed stance, 92% student-support stance, 95% active responding, and 97% passive responding.
*Percent of time (5-minute segments) coded.
**Percent of all reading segments coded.
***Percent of all codes for teacher interaction.
****Percent of all codes for student responding.
Statistical analysis. Hierarchical linear modeling (HLM; Raudenbush, Bryk, & Congdon, 2000) was used to investigate the impact of school-level and classroom-level characteristics on students' reading growth. Descriptive analyses were also conducted to elaborate on the quantitative findings.
HLM is a method of completing regression at multiple levels. The analyses in this study employed a two-level HLM model in which students were nested within classrooms or schools. Schools and classrooms were analyzed separately because there were different numbers of students at the school level than at the classroom level, since students whose teachers had declined to participate in the classroom observations were still included in the assessments for the school-level analysis. The number of schools was also insufficient to obtain stable results from a three-level model, in which students are nested within classrooms and classrooms are nested in schools.
HLM essentially estimates a regression within each school or classroom and combines these to see if they point to a common regression across schools or classrooms. When regressions (either the intercepts or slopes) vary across schools, then we can examine the school-level or classroom-level characteristics that may explain such variation. This is a common method for evaluating school-level and classroom-level factors and their effects on student outcomes. A simple regression would be inappropriate in these situations, since it would violate the independence assumption.
HLM also partitions variance components across levels, providing an estimate of variance in student performance within and between classrooms or schools. An unconditional HLM is one without an explanatory variable that allows us to answer the question: how much variance in student outcome can be attributed to systematic differences between classrooms and schools on specific factors? This analysis is equivalent to a random-effects analysis of variance. Estimations using HLM rest on assumptions similar to multivariate multiple regression.
Because of the improved estimation enabled by HLM, including the use of maximum likelihood and empirical Bayes estimates, interpretation of statistical results can be broadened to include a larger p-value associated with statistical tests. Furthermore, statistical results with p-values at or near 0.10 should be included in interpretation and explored in further studies with smaller numbers of cases (e.g., with fewer teachers or schools) because such results indicate that there are relationships which merit further exploration. For a more complete description of estimation of HLM, see Bryk and Raudenbush (1992, pp. 32-56). HLM is recognized as a standard program for estimating multi-level models (Bryk & Raudenbush, 1992; Kreft & De Leeuw, 1998).
School effectiveness. Students' fall and spring scores are presented in Table 7. We began by running a two-level HLM analysis investigating the relationship between students' spring fluency scores and school effectiveness factors (See Table 8). This analysis was based on data from 877 students across all 11 schools (including the students from the three schools that were in the study during both Years 1 and 2). Our HLM analysis revealed that the school-level factor of collaborative leadership accounted for 29% of second- through sixth-graders' variance in spring fluency scores (number of words read correctly in one minute), after accounting for fall scores. This means that of the total variation in students' scores, 71% could be attributed to differences between students after accounting for fall scores irrespective of school, and 29% could be attributed to between-school differences. Differences in collaborative leadership scores in turn accounted for 24% of between-school variance.
Across all schools, the mean school fluency score was 104.5 words correct per minute. One way of gauging the influence of collaborative leadership is to note that for every additional point scored on the collaborative leadership scale, a school's mean fluency score showed an increase of 27.4 words correct per minute. If we note that students increased their scores by an average of 20 words correct per minute per year (see Table 7) and that school scores on the collaborative leadership scale ranged from 1.1 to 1.9 with a mean of 1.7 (out of a high score of 3), then we can surmise that, at least in principle, a school gaining one additional point on the collaborative leadership scale could make up a year's worth of fluency performance.
The HLM analysis of spring writing scores among second- through sixth-graders (see Table 9) revealed that the school-level factor of collaborative leadership accounted for 25% of the variance in students' spring writing after accounting for fall scores. Collaborative leadership scores accounted for 40% of the between-school variance.
Across all schools, the mean school writing score was 1.96 on a four-point scale. A way of gauging the importance of the 42% of variance contributed by collaborative leadership is to note that for every additional point on the collaborative leadership scale a school's mean writing score showed an increase of .85. After including fall scores, HLM analyses investigating the relationship between school effectiveness scores or subscores (e.g., collaborative leadership) and students' spring reading comprehension were not significant.
*Only three of 11 schools had sixth-grade students. However, one of these schools had been with the project for 2 years, so 4 cohorts of grade 6 students were included in the data analyses.
As reported earlier, all teachers in each building had completed a two-part self-study questionnaire during Year 2. The first part of the questionnaire dealt with teacher's perceptions of various building- and classroom-level factors within their school; the second part dealt with their opinions of the school's needs in regards to its change efforts. In the first part of the questionnaire, teachers rated each of 38 items on a scale from 1 (strongly disagree) to 5 (strongly agree). These items dealt with school change, climate, and leadership; professional development; schoolwide decisions about reading instruction; classroom reading instruction; reading interventions for struggling readers, and school-home-community connections.
The school leadership ratings from our interviews were positively related to 12 of the 38 items from this first part of the questionnaire. The following is a list of the attributes which teachers saw as salient in cases where they perceived their schools to have strong leadership. The questionnaire items which were positively related to the interviews' school leadership ratings were:
Reform effort. We ran two-level HLM analyses investigating the relationship between students' spring comprehension, fluency, and writing scores and the school reform effort score for the total sample of 877 students in the 11 project schools (with data from three schools included separately across Years 1 and 2). HLM analyses investigating the relationship between the reform effort score and students' spring reading comprehension, fluency, or writing scores (after including the relevant fall scores) were not significant.
Using our coding of the 10 components of reform implementation, we were able to take a look at which schools were successfully implementing various factors. On the whole, we found that schools were having an easier time holding weekly study group meetings than they were holding monthly large-group meetings to share information across study groups and deal with schoolwide reform issues. Generally, schools were having an easier time meeting in grade-level groups than in cross-grade groups. Finally, schools were having an easier time reflecting on instruction and student work in study groups than they were focusing on research-based topics for periods of 3 months or longer. Most schools had not yet turned to the reform component of working with parents as partners (see Table 10).