Bill & Melinda Gates Foundation

Keeping Score: What the Teaching Profession Can Learn From Baseball

December 05, 2011

I evaluate teachers for a living. When I visit a classroom, I watch a lesson, discuss what I observed, share recommendations, and get additional feedback from the teacher. This visit—along with a principal’s evaluation—determines about 60 percent of the teacher’s “annual effectiveness rating.”

The remaining 40 percent is determined by a complex calculation of students’ progress on an annual test, as compared to the performance of students with similar characteristics.

This is a far better system than has ever been used before. But it is just a baby-step in gauging teacher effectiveness.

Baseball is a great analogy for teaching. Baseball works, in part, because the aim of the game is simple: to score more runs. Anything that improves the team's chances of scoring more runs is valuable. Anything that decreases it should be reduced or eliminated.

In baseball, managers make decisions based on a huge amount of statistical analysis (now called sabermetrics), thanks to technological tools and ready access to data. In the past twenty years, in fact, baseball has been transformed: the impressions and conventional wisdom of tobacco-chewing, road-warrior baseball scouts has been replaced by the results-oriented, proven effectiveness of sabermetrics.

Sabermetrics only exists because baseball statistics are "open source". We all have access to them. For many years we didn't. In the 19th century, baseball aficionado Henry Chadwick made value judgments about which statistics mattered. He created a scorecard highlighting those few measures—and his system was used for about a hundred years. Then people started challenging his assumptions, like the idea that “walks” aren't important, and baseball statistics took a huge leap forward.

Do we have all the statistics of teaching? Is the profession transparent? Do we have access to the data? The answer to all of these questions is “no.”

How badly would baseball talent be misjudged if only five of the 180 games that a team plays between March and October were considered? What if the players knew that two of those games were formal evaluation days? Would they play differently?

It isn't enough to just read the box score, watch three to five games, and pick the all-star team. You need to consider the whole game, and the whole season.

So how do we collect more and better statistics?

Clearly, we must bring what happens in the classroom into the light of day. We must develop meaningful statistical processes that correlate to teacher effectiveness—and give us a comprehensive view of it.

At the same time, it is important to balance this with the imperative to treat teachers as professionals.

Teachers deserve autonomy and the ability to do their jobs unimpeded. When it comes to classroom visits, for example, there should be clear rules to ensure they occur in a limited and non-disruptive way.

In baseball, isolated measurements can be wrong. Umpires miss the call sometimes. Similarly, many teachers are concerned that a measurement or subjective assessment of their effectiveness will be inaccurate. There is good reason to question the way things are done and to improve the metrics.

Still, accurate pictures of teaching quality can be developed over time. A bad call eventually gets balanced by an equally unfair good call. A batting average (or more importantly, an on-base percentage) emerges.

For the sake of the many teachers who teach effectively every day, and for the sake of the kids in classrooms where teachers do their jobs effectively only a handful of days each year, I hope we get the rest of the statistics soon.

 
blog comments powered by Disqus