Bill & Melinda Gates Foundation

A Dialogue: How Do We Consider Evidence of Student Learning in Teacher Evaluation?

August 07, 2012

This post originally appears on Anthony Cody's blog, Living in Dialogue. It is the second post in a weekly series of posts, over five consecutive weeks, between teacher Anthony Cody, and various members of the US education team at the foundation. The first dialogue, between Cody and Irvin Scott, Deputy Director of the College Ready team at the foundation, is here.

Education debates are often characterized wrongly as two warring camps: blame teachers for everything that’s not working in our schools or defend all teachers at all costs.

But there’s actually serious work going on in the middle, where there’s a lot of common purpose around helping teachers improve their practice and students improve their learning. The fundamental question is how do we reliably measure learning and use a range of quality feedback to provide great support for teachers to continually improve.

Not only do students have a right to effective instruction, good teachers value good teaching—they want to do their very best, and they welcome feedback to hone their craft.

That’s why, in 2009, the Bill & Melinda Gates Foundation, together with some 3,000 teacher volunteers, launched the Measures of Effective Teaching study, to identify effective teaching based on multiple measures of performance, not just test scores. And it’s why the foundation has invested heavily in a set of partnership sites that have been redesigning how they evaluate and support teaching talent throughout a teacher’s career.

Multiple measures are necessary because teaching is complex. Good teaching involves knowing the content and how to teach it, building a strong trusting relationship with students, setting and supporting high expectations, and continuously monitoring students’ understanding and adjusting instruction accordingly.

The notion that student learning should play no part in teacher evaluation systems, or that test scores should be the only measure of teaching performance, represent two extreme but unproductive camps.

Student learning has to be part of a teacher evaluation system because advancing students’ learning is a central goal of great teaching. But using gains on annual test scores as the sole measure of teaching performance has huge drawbacks.

First, the tests say how the students are performing too late for the teacher to do anything about it. Second, annual tests are not diagnostic. If the scores are high, they don’t tell us what the teacher did well. If the scores are low, they don’t tell us what the teacher could do better. Third, the scores simply aren’t available for the vast majority of teachers. Teaching is part art, part science – there are lots of great things that teachers do which will never be captured on a test.

That’s why the MET study examined a range of measures designed to capture teaching’s complexity, including: classroom observations, students’ perceptions of the instructional environment, and students’ achievement growth over time as measured by value-added calculations that take into account students’ different starting points. Each has strengths and shortcomings. Classroom observations, for example, can provide teachers with detailed feedback about their instruction and identify opportunities for improvement. But because they only occur a few times a year, they are less reliable than the feedback from many students who have experienced a teacher’s instruction all year long. Measures of the learning gains experienced by students in a teacher’s classroom are the most predictive of whether a teacher will achieve similar learning gains with a different group of students. But such measures, commonly based on end-of-year state tests, provide teachers with too little information too late about what to do differently, and may not reflect the full breadth and depth of instruction. The MET project found the greatest predictive power, reliability, and potential usefulness for professional development when these measures were combined. Moreover, we know from surveys of teachers that they have more trust in multiple measures than in any single measure used in isolation.  

The MET study also found that the current state tests, on which most value-added measures are based, need to be improved, particularly in English Language Arts. The study looked at the relationship between teachers’ scores on classroom observation rubrics and their students’ scores on both state ELA tests and on the SAT 9/Open-Ended, which actually asks students to write in responses to test questions. We found that the SAT 9/Open-Ended had a far better correlation with teacher’s classroom practice than the state ELA tests, which may not be capturing what students are actually learning in language arts classrooms.

One cause for optimism is that the new assessments being developed by the two multi-state assessment consortia, based on the Common Core State Standards, will be much more similar to the SAT-9/Open- Ended than to existing state tests. In the meantime, states and districts could consider weighing student gains on state ELA tests less heavily than they weigh gains on state math tests. They also could provide room for sound judgment, rather than relying solely on a mathematical algorithm to combine different aspects of teaching performance. Particularly when it comes to making important personnel decisions, principals should be able to weigh in, and teachers should be able to appeal the final rating, based on additional evidence.

The most important question, however, is not how evidence of student learning should be used in teacher evaluation, but how teacher evaluations themselves should be used.

And there, the focus must lie firmly on development. The reality is that few teachers are remarkably effective or ineffective, as confirmed by the MET study. Most teachers are professionals trying to get better at their craft. We owe it to them—and their students—to help.

The primary purpose of teacher evaluation systems isn’t to identify the small percentage of teachers who should choose another calling, or even those whose practice should be celebrated and spread (although we certainly want to do that). The purpose is to get teachers the targeted, personalized feedback and professional development they need and want, tied to more detailed information about their teaching, so that they can continue to improve collectively and individually.

 

 

 

 
blog comments powered by Disqus