For more context, The Washington Post features an op-ed today by Bill Gates, A fairer way to evaluate teachers.
Across the country, states and districts are putting in place long overdue teacher development and evaluation systems that will give teachers more tailored feedback about their practice based on multiple measures of performance.
As they do so, systems need to balance a sense of urgency with getting the measures right. For example, many states, districts, and teachers are working hard to develop new measures of student learning in grades and subjects not covered by state tests. These range from teacher-generated assessments, to tests that may not closely reflect the local curriculum, to the development of student learning objectives. Educators often face enormous pressure and rapid timelines to devise these measures. If we’re not careful the results could add more confusion than clarity.
Policymakers could help remove that pressure by allowing states and districts some flexibility in how they evaluate teachers in non-tested subjects and use those results. This would enable sites to set more realistic timelines and share what they’re learning. Delaware; New Haven, CT; and Charlotte-Mecklenburg, N.C.,, are in the vanguard of these efforts, to name a few. Why wouldn’t we create the opportunity to learn from these early front runners?
The primary purpose of teacher evaluation data should be to support teacher growth and improvement. Even in subjects with validated assessments, like literacy and math, test scores alone don’t indicate how effective a teacher has been. We need to be even more cautious about how we use new assessments without a proven track record. As states and districts consider how to use such information for personnel decisions, its common sense to make this transition carefully, gather data on how the measures perform, make adjustments as necessary, and give teachers and administrators time to adjust. It’s not our norm in education to iterate in this way, but we’ve already seen how successful this approach can be.
The primary purpose of teacher evaluation data should be to support teacher growth and improvement.
Hillsborough County, FL, for example, is gathering three years of evaluation data before using the results for high-stakes purposes—and it’s sharing what it’s learning with other Florida districts through the Florida Association of District School Superintendents. Similarly, the Denver Public Schools and the Pittsburgh Public Schools have significantly revised their evaluation systems based on data and feedback from teachers and principals.
Earlier this winter, the Gates Foundation released the final report from the Measures of Effective Teaching project—a large research study that validated multiple measures of teaching performance with the help of 3,000 teachers who opened up their classrooms for study. The foundation also released a set of guiding principles for implementing improvement-focused teacher evaluation systems, based both on the MET research and on the experiences of districts with whom we have worked over the past four years.
One set of principles focuses on ensuring high-quality data by using measures that are valid and reliable and that can be attributed accurately to the individual teacher. That’s important because teachers deserve measures that are fair, respectful of teaching, and that they can trust to provide useful information to support their improvement. The MET project identified three such measures—classroom observations of teaching practice; student surveys of the instructional environment; and student growth on both state tests in English language arts and mathematics and on more cognitively challenging supplemental assessments.
We understand that finding a fair and reliable way to measure effective teaching in non-tested grades and subjects is difficult and important. And we’re really excited by the progress we see toward more effective teacher support and evaluation systems around the country. But proceeding carefully in non-tested grades and subjects would give us time to assess these new measures so that they’re thoughtfully designed and can provide useful and unbiased feedback on both teacher and student performance.
In the interim, states and districts should not use yet-to-be-validated measures of student learning for high-stakes purposes for teachers. Nor should they hold teachers in non-tested grades and subjects accountable for math and ELA test results that can’t be properly attributed to their efforts. Instead, states and districts could consider weighing proven measures—such as classroom observations and student surveys—more heavily for such teachers. And they could use student learning objectives and other evidence of student learning for feedback and development only until they’re proven comparable and reliable across schools and classrooms.
It’s particularly important to be thoughtful now, as states and districts transition from their existing math and literacy tests to new assessments aligned with the Common Core-State Standards. Teachers are just starting to shift their practices to reflect the new standards. We’re going to learn a lot in the next few years as teachers help-create Common Core instructional materials, resources, and formative assessments in partnership with others. Getting the timing and sequencing right about how to connect these new assessments to teacher support and evaluation systems will be crucial. Let’s allow for the time needed before making quick decisions to meet an arbitrary deadline.
Using actual student growth as one measure of a teacher’s performance makes sense. After all, it’s why most teachers entered the profession in the first place: to help students learn. The foundation is committed to understanding how states and districts are gathering evidence of student learning in non-tested grades and subjects to identify places that are taking the most effective approaches and share their lessons more broadly.
Meanwhile, it’s sound practice and smart long-term policy to responsibly phase in new accountability metrics to ensure both their success and their survival. We fully expect states and districts to continue refining and improving their teacher development and evaluation systems over time, based on feedback, lessons learned, and evidence from their own data and from research.
The field should bring a similar mindset to judging how well teachers advance student learning in subjects beyond math and literacy. After all teachers in non-tested grades and subjects also deserve measures they can trust.